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Sir: 

This is an appeal from an Office Action dated June 4, 2009 ("Final Office 
Action"), in which claims 1-18 were finally rejected. The Appellant respectfully requests 
that the Board of Patent Appeals and Interferences ("Board") reverses the final rejection 
of claims 1-18 of the present application. The Appellant notes that this Appeal Brief is 
timely filed within the two-month period for reply that ends on November 4, 2009 (the 
Office date of receipt of the Notice of Appeal being September 4, 2009). 
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REAL PARTY IN INTEREST 
(37 C.F.R.§41.37(c)(1)(i)) 

Broadcom Corporation, a corporation organized under the laws of the state of 

California, and having a place of business at 5300 California Avenue, Irvine, California 
92617, has acquired the entire right, title and interest in and to the invention, the 
application, and any and all patents to be obtained therefor, as set forth in the 
Assignments recorded at Reel 015075, Frame 0264 in the PTO Assignment Search 
room. 

RELATED APPEALS AND INTERFERENCES 
(37 C.F.R.§41.37(c)(1)(ii)) 

The Appellant is unaware of any related appeals or interferences. 



STATUS OF THE CLAIMS 
(37C.F.R.§41.37(c)(1)(iii)) 

Claims 1-18 were finally rejected in the Final Office Action mailed June 4, 2009. 
Pending claims 1-18 are the subject of this appeal. 

The present application includes claims 1-18, which are pending in the present 
application. Claims 1, 4-6, 9-11 and 14-18 stand rejected under 35 U.S.C. § 103(a) as 
being unpatentable over U.S. Patent No. 5,781,696, by Oh et al. ("Oh"), in view of U.S. 
Patent No. 6,915,263, by Chen et al. ("Chen"). See Final Office Action at pages 4-9. 

Claims 2-3, 7-8 and 12-13 stand rejected under 35 U.S.C. § 103(a) as being 
unpatentable over U.S. Patent No. 5,781,696, by Oh et al. ("Oh"), in view of U.S. Patent 



2 



Application Serial Na 10/803,420 
Appeal Brief in Response to Final Office Action of June 4, 2009 



No. 6,915,263, by Chen et al. ("Chen"), and further in view of U.S. Patent No. 
5,684,829, by Kizuki et al. ("Kizuki"). See Final Office Action at pages 9-11. 

The Appellant identifies clainns 1-18 as the claims that are being appealed. The 
text of the pending claims is provided in the Claims Appendix. 



STATUS OF AMENDMENTS 
(37 C.F.R.§41.37(c)(1)(iv)) 

The Appellant has not amended any claims subsequent to the final rejection of 
claims 1-18 mailed on June 4, 2009. 

SUMMARY OF CLAIMED SUBJECT MATTER 
(37C.F.R.§41.37(c)(1)(v)) 

Independent claim 1 recites the following: 

A method for speeding up an encoded original audio signal, said original audio 
signal having an original frequency and original playback speed, ^ said method 
comprising: 

receiving the encoded original audio signal;^ 
retrieving frames of the original audio signal;^ 



' See present application, e.g.. at page 3, lines 2-5; Figure 2, Figure 3; Figure 5; Figure 6; Figure 
7. 

^ See present application, e.g., at page 3, lines 26-27; page 7, line 32 - page 8, line 3; page 10, 
lines 16-20; page 12, lines 11-13; page 13, lines 11-12; Figure 2 (213); Figure 3 (421); Figure 5 

(209); Figure 6 (401); Figure 7 (301). 

^ See id, e.g., at page 3, lines 27-28; page 8, lines 2-3; page 11, lines 1-4; page 12, lines 20-23; 
page 14, lines 7-10; Figure 5 (203); Figure 6 (407); Figure 7 (323). 
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skipping frames at a rate according to a desired playback speed;'* 
wherein said desired playback speed is greater than the original playback 
speed ;^ 

applying a window function to the remaining frames;® 
converting the signal with the windowed frames from digital to analog 
format;^ and 

using the original frequency to playback the analog format signal.^ 



Claims 2-5 and 16 are dependent upon claim 1. 



Independent claim 6 recites the following: 

A machine-readable storage having stored thereon, a computer program having 
at least one code section that speed up an encoded original audio signal, said original 
audio signal having an original frequency and original playback speed, the at least one 



See id., e.g., at page 3; lines 28-29; page 8, lines 3-7; page 11, lines 8-19; page 12, lines 23-27; 
page 14, lines 14-28; Figure 2 (212); Figure 3 (423); Figure 5 (202); Figure 6 (409); Figure 7 
(325). 

^ See id., e.g., at page 3, lines 12-14; page 8, lines 3-7; page 11, lines 8-19; page 12, lines 23-27; 
page 14, lines 14-28; Figure 2 (212); Figure 3 (423); Figure 5 (202); Figure 6 (409). 
^ See id, e.g., at page 3, lines 29-30; page 8, lines 8-15; page 11, lines 20-27; page 12, line 28 - 
page 13, line 3; page 14, line 29 - page 15, line 5; Figure 2 (214); Figure 3 (425); Figure 5 (204); 
Figure 6 (410); Figure 7 (325). 

^ See id., e.g., at page 3, line 30 - page 4, line 1; page 8, lines 16-20; page 11, lines 28-30; page 
13, lines 4-8; page 15, lines 6-9; Figure 3 (427); Figure 5 (201); Figure 6 (411); Figure 7 (327). 
^ See id., e.g., at page 4, lines 1-2; page 8, lines 16-20; page 12, lines 1-7; page 13, lines 4-8; 
page 15, lines 10-14; Figure 2 (211); Figure 5 (201). 
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code section being executable by a machine for causing the machine to perform 
operations^ comprising: 

receiving the encoded original audio signal;^° 
retrieving frames of the original audio signal;^ ^ 
skipping frames at a rate according to a desired playback speed;^^ 
wherein said desired playback speed is greater than the original playback 
speed 

applying a window function to the remaining frames;^"* 
converting the signal with the windowed frames from digital to analog 
format; and 

using the original frequency to playback the analog format signal.''^ 



^ See present application, e.g., at page 3, lines 2-9; Figure 2, Figure 3; Figure 5; Figure 6; Figure 
7. 

'° See id, e.g., at page 3, line 10; page 7, line 32 - page 8, line 3; page 10, lines 16-20; page 12, 
lines 11-13; page 13, lines 11-12; Figure 2 (213); Figure 3 (421); Figure 5 (209); Figure 6 (401); 
Figure 7 (301). 

" See id, e.g., at page 3, lines 10-11; page 8, lines 2-3; page 1 1, lines 1-4; page 12, lines 20-23 
page 14, lines 7-10; Figure 5 (203); Figure 6 (407); Figure 7 (323). 

See id, e.g. at page 3; lines 1 1-12; page 8, lines 3-7; page 11, lines 8-19; page 12, lines 23-27: 
page 14, lines 14-28; Figure 2 (212); Figure 3 (423); Figure 5 (202); Figure 6 (409); Figure 7 

(325). 

" See id, e.g, at page 3, lines 12-14; page 8, lines 3-7; page 11, lines 8-19; page 12, lines 23-27; 
page 14, lines 14-28; Figure 2 (212); Figure 3 (423); Figure 5 (202); Figure 6 (409). 

See id, e.g, at page 3, line 14; page 8, lines 8-15; page 11, lines 20-27; page 12, line 28 - page 
13, line 3; page 14, line 29 - page 15, line 5; Figure 2 (214); Figure 3 (425); Figure 5 (204); 
Figure 6 (410); Figure 7 (325). 

See present application, e.g., at page 3, lines 15-16; page 8, lines 16-20; page 11, lines 28-30; 
page 13, lines 4-8; page 15, lines 6-9; Figure 3 (427); Figure 5 (201); Figure 6 (411); Figure 7 
(327). 

See id, e.g, at page 3, lines 16-17; page 8, lines 16-20; page 12, lines 1-7; page 13, lines 4-8; 
page 15, lines 10-14; Figure 2 (21 1); Figure 5 (201). 
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Claims 7-10 and 17 are dependent upon claim 6. 



Independent claim 1 1 recites the following: 

A system that speeds up an encoded original audio signal, said original audio 
signal having an original frequency and original playback speed, the system 
comprising: 

at least one controller configured to receive the encoded original audio 

signal;^^ 

the at least one controller configured to retrieve frames of the original 
audio signal;^® 

the at least one controller configured to skip frames at a rate according to 
a desired playback speed;^° 

wherein said desired playback speed is greater than the original playback 

speed 



See present application, e.g., at page 3, lines 2-5 and 18; Figure 2, Figure 3; Figure 5; Figiire 6; 

Figure 7. 

See id., e.g., at page 3, lines 18-19; page 7, line 32 - page 8, line 3; page 10, lines 16-20; page 
12, lines 11-13; page 13, lines 11-12; Figure 2 (213); Figure 3 (421); Figure 5 (209); Figure 6 

(401); Figure 7 (301). 

See id., e.g., at page 3, lines 18-20; page 8, lines 2-3; page 11, lines 1-4; page 12, lines 20-23; 
page 14, lines 7-10; Figure 5 (203); Figure 6 (407); Figure 7 (323). 

See present application, e.g., at page 3; lines 18-21; page 8, lines 3-7; page 1 1, lines 8-19; page 
12, lines 23-27; page 14, lines 14-28; Figure 2 (212); Figure 3 (423); Figure 5 (202); Figure 6 

(409); Figure 7 (325). 
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the at least one controller configured to apply a window function to tine 
remaining frames;^^ 

tlie at least one controller configured to convert the signal with the 
windowed frames from digital to analog format;^^ and 

the at least one controller configured to use the original frequency to 
playback the analog format signal.^'* 



Claims 12-15 and 18 are dependent upon claim 11. 



GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 
(37 C.F.R.§41.37(c)(1)(vl)) 

Claims 1, 4-6, 9-11 and 14-18 stand rejected under 35 U.S.C. § 103(a) as being 
unpatentable over U.S. Patent No. 5,781,696, by Oh et al. ("Oh"), in view of U.S. Patent 
No. 6,915,263, by Chen et al. ("Chen"). See Final Office Action at pages 4-9. 

Claims 2-3, 7-8 and 12-13 stand rejected under 35 U.S.C. § 103(a) as being 
unpatentable over U.S. Patent No. 5,781,696, by Oh et al. ("Oh"), in view of U.S. Patent 



^' See id, e.g., at page 3, lines 12-14; page 8, lines 3-7; page 11, lines 8-19; page 12, lines 23-27; 
page 14, lines 14-28; Figure 2 (212); Figure 3 (423); Figure 5 (202); Figure 6 (409). 

See id, e.g., at page 3, lines 18-19 and 22; page 8, lines 8-15; page 11, lines 20-27; page 12, 
line 28 - page 13, line 3; page 14, line 29 - page 15, line 5; Figure 2 (214); Figure 3 (425); 
Figure 5 (204); Figure 6 (410); Figure 7 (325). 

See id., e.g., at page 3, lines 18-19 and 23-24; page 8, lines 16-20; page 11, lines 28-30; page 
13, lines 4-8; page 15, lines 6-9; Figure 3 (427); Figure 5 (201); Figure 6 (411); Figure 7 (327). 
^'^ See id, e.g, at page 3, lines 18-19 and 24-25; page 8, lines 16-20; page 12, lines 1-7; page 13, 
lines 4-8; page 15, lines 10-14; Figure 2 (211); Figure 5 (201). 
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No. 6,915,263, by Chen et al. ("Chen"), and further in view of U.S. Patent No. 
5,684,829, by Kizuki et al. ("Kizuki"). See Final Office Action at pages 9-11. 
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ARGUMENT 
(37 C.F.R.§41.37(c)(1)(vii)) 

In the Final Office Action, claims 1-18 stand rejected under 35 U.S.C. § 103(a) as 
being unpatentable over various combinations of Oh, Chen and Kizuki. 

I. The Proposed Combination of Oh and Chen Does Not Render Claims 1, 4-6, 
9-11 and 14-18 Unpatentable 

The Appellant turns to the rejection of claims 1, 4-6, 9-11 and 14-18 as being 
unpatentable over Oh In view of Chen. 

A. Rejection of Independent Claims 1 , 6 and 1 1 

With regard to the rejection of independent claims 1, 6 and 11 under 103(a), the 
Appellant submits that the combination of references cited in the Final Office Action fails 
to disclose, for example, at least the limitations of "skipping frames at a rate according 
to a desired playback speed... applying a window function to the remaining frames," as 
recited in Appellant's independent claims 1 and 6; and "the at least one controller 
configured to skip frames at a rate according to a desired playback speed... the at least 
one controller configured to apply a window function to the remaining frames," as 
recited in Appellant's independent claim 11. 

With regard to "skip[ping] frames at a rate according to a desired playback 
speed... apply[ing] a window function to the remaining frames," the Final Office Action 
acknowledges that Oh fails to teach applying a window function to the remaining 
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frames. (Final Office Action, Page 5, Lines 1-2). However, tlie Final Office Action 
alleges that Chen's disclosure of muting frames by applying an attenuation function or 
window to the frame to soften the mute discloses the Appellant's claim limitations. (Final 
Office Action, Page 5, Line 16 - Page 6, Line 13). The Appellant notes that Chen fails 
to remedy the deficiencies of Oh for several reasons. 

First, the Appellant notes that Chen fails to disclose skipping frames. Thus, 
Chen also fails to disclose "apply[ing] a window function to the remaining frames ." 
The Final Office Action alleges that Chen's disclosure of muting frames teaches 
skipping frames; however, the Appellant notes that muting frames and skipping frames 
are different. For example, claim 1 recites "[a] method for speeding up an encoded 
original audio signal... comprising... skipping frames at a rate according to a desired 
playback speed." In other words, the playback speed is sped up by skipping frames. 
Put another way, as defined in the Appellant's claims, skipped frames are not played 
back in order to speed up a playback speed. Chen's muting has no effect on the 
playback speed. Rather, Chen's muted frames are played back and the number of 
muted frames merely impacts the length of the silence period. (See e.g., Chen Column 
2, Lines 54-64 and Column 7, Lines 15-20). 

Although, in one instance, Chen does refer to a muted frame as "skipped," the 
Appellant notes that nowhere in Chen is there any disclosure regarding not playing back 
a muted frame. If Chen did not play back the muted frames (and instead those frames 
are "skipped" as suggested by the Examiner), then there would be no silence period 
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and Chen would be rendered inoperable. Therefore, the context of the term "skipped" 
used in the one instance in Chen is clearly different than the terms "skip" and "skipping" 
as defined in Appellant's independent claims 1 , 6 and 1 1 . Thus, because Chen clearly 
fails to disclose skipped frames (i.e., frames not played back), Chen fails to remedy the 
deficiencies of Oh in that the combination of references cannot disclose "apply[ing] a 
window function to the remaining frames ." as set forth in Appellant's independent 
claims 1 , 6 and 1 1 . 

Second, even if Chen's muted frames could be considered skipped frames 
(which they clearly are not), the Appellant notes that Chen discloses applying the 
attenuation function or window to the current frame (i.e., error frame or frame to be 
muted) to soften the mute. For example, Chen discloses receiving a current frame; 
determining whether a first error sum is greater than zero for the current frame; if the 
first error sum is not greater than zero, performing a normal decode of the current 
frame; if the first error sum is greater than zero, determining whether a second error 
sum is greater than a tolerance value; if the second error sum is less than the tolerance 
value, performing a normal decode of the current frame; and if the second error sum 
is greater than the tolerance level, muting the current frame by, for example, 
"apply[ing] a soft mute to the current frame " or applying a frame repeat process. 
(Chen, Figure 4 and Column 7, Line 9 through Column 10, Line 26 (emphasis added)). 
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With regard to applying a soft mute to tlie current frame, Clien discloses "an 
attenuation function or 'window' was applied to the error frame t o soften the mute." 
(Chen, Column 2, Lines 14-15 (emphasis added)). Chen further discloses "a number of 
different muting operations can be performed to mute the current frame . In the 
preferred embodiment, a smooth muting with zeros can be applied to decline the audio 
signal at a given rate according to a window function .. . . " (Chen, Column 9, Lines 23- 
27 (emphasis added)). The Appellant further notes that Chen's Figure 6 "illustrates an 
audio signal In accordance with a muted audio frame and a smoothing window 
function applied thereto . " (Chen, Column 4, Lines 50-52 (emphasis added)). 

As clearly shown above, Chen discloses applying its 
muting/attenuation/smoothing window function to frames that are to be muted. The 
Final Office Action acknowledges that Chen teaches applying a window function to zero 
or mute frames. (See e.g., Page 2, Line 21 - Page 3, Line 3). Thus, because Chen 
clearly discloses applying a window function to mute frames and the Final Office Action 
acknowledges that the window function is Chen is used to mute or zero frames, the 
Appellant notes that Chen's teaching of muting frames using a window function is 
different than " skipping frames at a rate according to a desired playback speed... 
applying a window function to the remaining frames ." as recited in Appellant's 
independent claims 1 and 6; and "the at least one controller configured to skip frames 
at a rate according to a desired playback speed... the at least one controller configured 
to apply a window function to the remaining frames ." as recited in Appellant's 
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independent claim 11. Specifically, if Chen's muted frames are to be considered 
skipped frames (which they clearly are not), the Appellant notes that applying a window 
function to mute or zero frames is different than " skiprpinql frames at a rate according 
to a desired playbacl< speed... applying] a window function to the remaining frames . " 
as set forth in Appellant's independent claims 1 , 6 and 1 1 . 

Third, the Appellant notes that the Final Office Action states that "applying a 
window function as taught by Chen to allow for the smoothing of a signal after certain 
frames were removed/muted." (Final Office Action, Page 6, Lines 16-17). However, the 
Appellant notes that nowhere in Chen is there any disclosure regarding removing 
frames. Rather, as noted above, Chen's muted frames are played back as silence 
periods. Further, the Appellant notes that Chen does not teach applying a window 
function after frames are muted. Rather, Chen teaches applying a window function to a 
frame to mute the frame. Thus, the Appellant notes that the Final Office Action 
mischaracterizes Chen's teachings. 

Fourth, the Appellant notes that one of ordinary skill in the art would not combine 
the teachings of Oh and Chen because Oh is related to speed-variable audio playback 
by adding or deleting separated speech source components of an input audio signal 
while Chen provides an audio decoder unit that mutes error frames and merges nearby 
muted frames to extend a silence period between the error frames when the error rate is 
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high. Put another way, Chen is unrelated to speeding up an audio playback speed and 
Oh is unrelated to muting error frames. Thus, it is unclear how Chen's teaching of 
muting error frames adds to Oh's disclosure. 

The Final Office Action cites to Chen's Column 9, lines 9-38 and Column 10, 
lines 1-26 as the motivation to combine the references. However, with regard to Chen's 
Column 10, lines 1-26, the Appellant notes that the cited section of Chen is unrelated to 
Chen's muting/attenuation/smoothing window function embodiment. Specifically, Chen 
teaches "[i]n the preferred embodiment, a smooth muting with zeros can be applied to 
decline the audio signal at a give rate according to a window function and in an 
alternate embodiment , a frame repeat can be performed. " Chen's Column 9, line 64 
- Column 10, line 26 relates to Chen's frame repeat embodiment and makes no 
mention of using a window function. Thus, because Chen's Column 10, lines 1-26 as 
cited by the Final Office Action is unrelated to Chen's muting/attenuation/smoothing 
window function embodiment, it is unclear how the cited section provides a motivation to 
use Chen's window function in Oh. 

With regard to Chen's Column 9, lines 9-38, the Appellant notes that the cited 
section merely teaches applying a window function to mute a current frame based on an 
error rate. As noted above, Oh is unrelated to muting frames based on an error rate. 
Thus, it is unclear how the cited section provides a motivation to use Chen's window 
function in Oh. 
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Basically, the combination of Oil and Clien fail to disclose, for example, at least 
the limitations of " skipping frames at a rate according to a desired playback speed... 
applying a window function to the remaining frames ," as recited in Appellant's 
independent claims 1 and 6; and "the at least one controller configured to skip frames 
at a rate according to a desired playback speed... the at least one controller configured 
to apply a window function to the remaining frames . " as recited in Appellant's 
independent claim 11. Rather, Oh merely discloses applying a window function to the 
audio characteristics component. Nowhere in Oh is there any disclosure regarding 
applying a window function to the speech source components not deleted by the speech 
source modulating unit of the pitch modulating unit 4. Thus, as acknowledged by the 
Final Office Action, Oh fails to disclose "applying] a window function to the remaining 
frames , " as set forth in Appellant's independent claims 1, 6 and 11. Chen fails to 
remedy the deficiencies of Oh in that Chen merely discloses applying a window function 
to mute a current frame based on an error rate, which is different than " skipfpingl 
frames at a rate according to a desired playback speed... apply[ing] a window function 
to the remaining frames ." as set forth in Appellants independent claims 1 , 6 and 1 1 . 

Therefore, the Appellant maintains that at least the limitations " skipping frames 
at a rate according to a desired playback speed... applying a window function to the 
remaining frames . " as recited in Appellant's independent claims 1 and 6; and "the at 
least one controller configured to skip frames at a rate according to a desired playback 
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speed... the at least one controller configured to apply a window function to the 
remaining frames ." as recited in Appellant's independent claim 11, are not obvious 
over Oh in view of Chen. Accordingly, independent claims 1, 6 and 11 are not 
unpatentable over Oh in view of Chen and are allowable. 



B. Examiner's Response to Arguments 

The Examiner responded to the Appellant's arguments on pages 2-3 of the Final 
Office Action. 



Specifically, the Final Office Action states the following: 



Chen describes that which is well known in the art. Consider the 
inherency of a window function is directed to preserving signal data within 
a window/interval, wherein any data outside the interval is zeroed (or 
muted for an audio signal, and thus skipped). The concept of Oh is 
realized through the teaching of Chen, wherein by applying a window 
function to frames, the zeroing or muting frames is present. Also consider 
that a window function itself can be applied to a speech signal with given 
parameters that allow for elimination of data outside a windowed frame 
(i.e. other frames NOT in the window). Chen thus teaches attenuation of 
the signal outside the windowed area (Chen Col. 9 lines 9-38). Further, 
consider the purpose of a window function in a speech signal in the 
instance where increasing the playback speed alone may have residual 
undesirable effects. Thus the use of a window function as taught by 
Chen, would alleviate any residual effects or noise present after skipping 
frames. This is also consistent with the present invention, wherein both 
Chen and the present invention teach the concept of overlapping as well 
as "smoothing" a signal out (present invention [0028]). Chen in explicitly 
teaches the elimination of errors (residual effects or noise) by the well 
known use of a window function to completely stop any surrounding 
errors. Chen also differentiates between the use of partial and full 
attenuation through window functions (Chen Col. 10 lines 1-26). 
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(Final Office Action, Page 2, Line - Page 3, Line 17). The Appellant notes, however, 
that the Final Office Action mischaracterizes Chen and the Appellant's claims. 

First, the Appellant notes that Appellant's claims clearly recite that frames are 
skipped prior to applying a window function (i.e. "applying a window function to the 
remaining frames"). Thus, the Final Office Action's allegation that frames are skipped or 
muted by applying a window function to non-muted frames is different than "skipping 
frames at a rate according to a desired playback speed... applying a window function to 
the remaining frames " as recited in Appellant's independent claims 1 and 6; and "the 
at least one controller configured to skip frames at a rate according to a desired 
playback speed... the at least one controller configured to apply a window function to 
the remaining frames . " as recited In Appellant's independent claim 1 1 . 

Second, Chen teaches applying a window function to a current frame to mute the 
current frame. Thus, if the Final Office Action is interpreting Chen's muted frames to be 
skipped (despite Chen's teaching that all frames are played back), Chen fails to remedy 
the deficiencies of Oh in that the combination of references clearly fail to disclose 
" sklprpingi frames at a rate according to a desired playback speed... apply[ing] a 
window function to the remaining frames ." as set forth in Appellant's independent 
claims 1, 6 and 11. 

Third, with regard to the Final Office Action's allegation that "Chen and the 
present invention teach the concept of overlapping as well as 'smoothing' a signal out," 
the Appellant notes that Chen's overlap teachings are not related to the application of 
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window functions, and Instead are directed to repeating a previous frame to conceal an 
error in the current frame in lieu of soft muting using a window function. (See e.g., 
Chen, Column 9, Lines 24-28 and Lines 64-66; Column 10, Lines 11-19 and 23-26). 
Nowhere in Chen is there any teaching regarding overlap in connection with performing 
a window function. Thus, the Appellant notes that the Final Office Action 
mischaracterizes Chen. 

Fourth, with regard to the Final Office Action's allegation that "Chen also 
differentiates between the use of partial and full attenuation through window functions 
(Chen Col. 10 lines 1-26)," the Appellant notes, as discussed above, that the cited 
section of Chen is unrelated to window functions. Rather, the cited section of Chen 
discusses repeating frames in lieu of soft muting using window functions when there are 
less than three or four consecutive error frames. (See e.g., Chen, Column 9, Lines 64 - 
Column 10, Line 7; Column 10, Lines 23-26). Thus, the Appellant notes that the Final 
Office Action mischaracterizes Chen. 

Accordingly, independent claims 1, 6 and 11 are not unpatentable over Oh in 
view of Chen and are allowable. Furthermore, the Appellant reserves the right to argue 
additional reasons beyond those set forth herein to support the allowability of claims 1 , 
6 and 11. 
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C. Rejection of Dependent Claims 4-5, 9-1 0 and 1 4-1 5 

Claims 4-5, 9-10 and 14-15 depend on independent claims 1, 6 and 11, 
respectively. Therefore, the Appellant submits that claims 4-5, 9-10 and 14-15 are 
allowable over the combination of references cited in the Final Office Action at least for 
the reasons stated above with regard to claims 1 , 6 and 1 1 . 

The Appellant also submits that at least the limitation of "wherein the desired 
playback speed is a predefined default value ." as recited by the Appellant in claim 4, 9 
and 14; and "wherein the desired playback speed is a programmable value ," as recited 
by the Appellant in claims 5, 10 and 15, are not obvious over Oh in view of Chen. 

The Final Office Action alleges that Oh's Column 6, Lines 34-38 teaches 
"wherein the desired playback speed is a predefined default value," as recited by the 
Appellant in claim 4, 9 and 14; and "wherein the desired playback speed is a 
programmable value," as recited by the Appellant in claims 5, 10 and 15. The cited 
section of Oh states the following: 

5q: a variable for determining the play-back speed. 

The speed-varied speech signal x(n) is sent to the D/A converter 7 via the 
output buffer 6. In the D/A converter 7, the speech signal x(n) is converted 
into an analog signal which is, in turn, output as an audio-out signal. 

(Oh, Column 6, Lines 34-38). Clearly, nowhere in the cited section of Oh is there any 

mention of the playback speed being a predefined default value . Nor does the cited 

section of Oh teach that the desired playback speed is a programmable value . 

Rather, the cited section of Oh merely teaches that 5q is a variable for determining play- 
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back speed and x(n) is a speed-varied speech signal. The Appellant notes that Oh's 
disclosure fails to teach "wherein the desired playback speed is a predefined default 
value," as recited by the Appellant in claim 4, 9 and 14; and "wherein the desired 
playback speed is a programmable value," as recited by the Appellant in claims 5, 1 0 
and 15. Further, Chen fails to remedy the deficiencies of Oh. Accordingly, the 
Appellant submits that claims 4-5, 9-10 and 14-15 are allowable over the combination of 
references cited in the Final Office Action at least for the above reasons. 

The Appellant also reserves the right to argue additional reasons beyond those 
set forth above to support the allowability of claims 4-5, 9-10 and 14-15. 

D. Rejection of Dependent Claims 1 6-1 8 

Claims 16, 17 and 18 depend on independent claims 1, 6 and 11, respectively. 
Therefore, the Appellant submits that claims 16, 17 and 18 are allowable over the 
combination of references cited In the Final Office Action at least for the reasons stated 
above with regard to claims 1, 6 and 11. 

The Appellant also submits that at least the limitation of "wherein skipping frames 
at a rate according to a desired playback speed further comprises skipping frames at a 
rate according to a desired playback speed, wherein the frames correspond to time 
Intervals ." as recited by the Appellant in claims 16, 17 and 18, are not obvious over Oh 
in view of Chen. 
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The Final Office Action alleges that Chen's Column 7, Lines 37-55 and Column 
9, Lines 9-38 teaches "wherein skipping frames at a rate according to a desired 
playback speed further comprises skipping frames at a rate according to a desired 
playback speed, wherein the frames correspond to time intervals . " as recited by the 
Appellant in claims 16, 17 and 18. However, nowhere in the cited section of Chen is 
there any mention of "wherein skipping frames at a rate according to a desired playback 
speed further comprises skipping frames at a rate according to a desired playback 
speed, wherein the frames correspond to time intervals , " as recited by the Appellant 
in claims 16, 17 and 18. Further, Oh fails to remedy the deficiencies of Chen. 
Accordingly, the Appellant submits that claims 16, 17 and 18 are allowable over the 
combination of references cited in the Final Office Action at least for the above reasons. 

The Appellant also reserves the right to argue additional reasons beyond those 
set forth above to support the allowability of claims 16, 17 and 18. 



II. The Proposed Combination of Oh, Chen and Kizuki Does Not Render Claims 
2-3, 7-8 and 12-13 Unpatentable 

The Appellant turns to the rejection of claims 2-3, 7-8 and 12-13 as being 
unpatentable over Oh in view of Chen, and further in view of Kizuki. 

Claims 2-3, 7-8 and 12-13 depend on independent claims 1, 6 and 11, 
respectively, and Kizuki fails to remedy the previously mentioned deficiencies of Oh in 
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view of Chen. Therefore, the Appellant submits that claims 2-3, 7-8 and 12-13 are 
allowable over the combination of references cited in the Final Office Action at least for 
the reasons stated above with regard to claims 1 , 6 and 11 . 

Accordingly, the Appellant submits that claims 2-3, 7-8 and 12-13 are allowable 
over the combination of references cited in the Final Office Action at least for the above 
reasons. The Appellant also reserves the right to argue additional reasons beyond 
those set forth above to support the allowability of claims 2-3, 7-8 and 12-13. 
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CONCLUSION 

For at least the foregoing reasons, the Appellant submits that claims 1-18 are in 
condition for allowance. Reversal of the Examiner's rejection and issuance of a patent 
on the application are therefore requested. 

The Commissioner is hereby authorized to charge $540 (to cover the Brief on 
Appeal Fee) and any additional fees or credit any overpayment to the deposit account 
of McAndrews, Held & Malloy, Ltd., Account No. 13-0017. 



Respectfully submitted, 



Date: 4-NOV-2009 By: /Philip Henrv Sheridan/ 

Philip Henry Sheridan 
Reg. No. 59,918 
Attorney for Appellant 



McANDREWS, HELD & MALLOY, LTD. 
500 West Madison Street, 34th Floor 
Chicago, Illinois 60661 
(T) 312 775 8000 
(F) 312 775 8100 

(PHS) 
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CLAIMS APPENDIX 
(37 C.F.R. § 41.37(c)(1)(viii)) 

1. A method for speeding up an encoded original audio signal, said original 

audio signal having an original frequency and original playback speed, said method 

comprising: 

receiving the encoded original audio signal; 

retrieving frames of the original audio signal; 

skipping frames at a rate according to a desired playback speed; 
wherein said desired playback speed is greater than the original playback speed; 

applying a window function to the remaining frames; 

converting the signal with the windowed frames from digital to analog 
format; and 

using the original frequency to playback the analog format signal. 



2. The method according to claim 1 wherein the encoded original audio 
signal is encoded in the frequency domain using one of a plurality of encoding schemes, 
the method further comprising frequency-domain decoding of the encoded original 
audio signal. 
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3. The method according to claim 2 wherein said decoding comprises: 
decoding said encoded signal using a decoding scheme corresponding to said one of a 
plurality of encoding schemes; applying an inverse transform to the encoded audio 
signal; and applying an inverse window function. 

4. The method according to claim 1 wherein the desired playback speed is a 
predefined default value. 

5. The method according to claim 1 wherein the desired playback speed is a 
programmable value. 

6. A machine-readable storage having stored thereon, a computer program 
having at least one code section that speed up an encoded original audio signal, said 
original audio signal having an original frequency and original playback speed, the at 
least one code section being executable by a machine for causing the machine to 
perform operations comprising: 

receiving the encoded original audio signal; 
retrieving frames of the original audio signal; 
skipping frames at a rate according to a desired playback speed; 
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wherein said desired playbacl< speed is greater than the original playback speed; 

applying a window function to the remaining frames; 

converting the signal with the windowed frames from digital to analog 
format; and 

using the original frequency to playback the analog format signal. 

7. The machine-readable storage according to claim 6 wherein the encoded 
original audio signal is encoded in the frequency domain using one of a plurality of 
encoding schemes, the machine-readable storage further comprising code for 
frequency-domain decoding of the encoded original audio signal. 

8. The machine-readable storage according to claim 7 further comprising: 
code for decoding said encoded signal using a decoding scheme corresponding to said 
one of a plurality of encoding schemes; code for applying an inverse transform to the 
encoded audio signal; and code for applying an inverse window function. 

9. The machine-readable storage according to claim 6 wherein the desired 
playback speed is a predefined default value. 



26 



Application Serial Na 10/803,420 
Appeal Brief in Response to Final Office Action of June 4, 2009 

10. The machine-readable storage according to claim 6 wherein the desired 
playback speed is a programmable value. 

11. A system that speeds up an encoded original audio signal, said original 
audio signal having an original frequency and original playback speed, the system 
comprising: 

at least one controller configured to receive the encoded original audio 

signal; 

the at least one controller configured to retrieve frames of the original 
audio signal; 

the at least one controller configured to skip frames at a rate according to 
a desired playback speed; 

wherein said desired playback speed is greater than the original playback speed; 

the at least one controller configured to apply a window function to the 
remaining frames; 

the at least one controller configured to convert the signal with the 
windowed frames from digital to analog format; and 

the at least one controller configured to use the original frequency to 
playback the analog format signal. 
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12. The system according to claim 11 wlierein tine encoded original audio 
signal is encoded in the frequency domain using one of a plurality of encoding schemes, 
the system further comprising code for frequency-domain decoding of the encoded 
original audio signal. 

13. The system according to claim 12 further comprising: the at least one 
controller configured to decode said encoded signal using a decoding scheme 
corresponding to said one of a plurality of encoding schemes; the at least one controller 
configured to apply an inverse transform to the encoded audio signal; and the at least 
one controller configured to apply an inverse window function. 

14. The system according to claim 1 1 wherein the desired playback speed is a 
predefined default value. 

15. The system according to claim 1 1 wherein the desired playback speed is a 
programmable value. 
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16. The method of claim 1, wherein sl<ipping frames at a rate according to a 
desired playback speed further comprises skipping frames at a rate according to a 
desired playback speed, wherein the frames correspond to time intervals. 

17. The machine-readable storage of claim 6, wherein skipping frames at a 

rate according to a desired playback speed further comprises skipping frames at a rate 
according to a desired playback speed, wherein the frames correspond to time intervals. 

18. The system of claim 11, wherein skipping frames at a rate according to a 
desired playback speed further comprises skipping frames at a rate according to a 
desired playback speed, wherein the frames correspond to time intervals. 
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EVIDENCE APPENDIX 
(37 C.F.R.§41.37(cH1)(ix)) 

United States Patent No. 5,781 ,696 ("Oh"), entered into record by the Examiner in 

the May 25, 2007 Office Action. 

United States Patent No. 6,915,263 ("Chen"), entered into record by the Exanniner 
in the October 27, 2008 Office Action. 

United States Patent No. 5,684,829 ("Kizuki"), entered into record by the Examiner 
in the October 27, 2008 Office Action. 
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RELATED PROCEEDINGS APPENDIX 
(37 C.F.R.§41.37(c)(1)(x)) 

The Appellant is unaware of any related appeals or interferences. 
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157] ABSTRACT 

A speed-variable audio play-back apparatus which Includes 
a pitch detecting circuit for separating speech source com- 
ponents and audio characteristics from an input audio signal, 
a pitch modulating unit for deleting selected ones of the 
separated speech source components or adding another 
speech source components to the separated speech source 
conqranents depending on a play-back speed, thereby 
adjusting the length of the audio signal to be played back, a 
speech synthesizing circuit for synthesizing the speech 
source con^nents and audio characteristics modulated by 
the pitch modulating unit, thereby outputting a speed-varied 
audio .signal; and a main controller for controlling the above 
components in accordance with control signals externally 
applied (hereto. With this arrangement, it is possible to play 
back audio stored in a storage medium at an adjusted speed 
while preventing degradation in tone color and loss of audio 
signals from occurring upon varying die play-back speed 
when the audio is played back by an qiparatus such as a tape 
player, VTR, multimedia equipment, or computer, so that die 
piayed-back audio sounds Uke a person speaking quickly or 
slowly. 

8 Claims, 4 I>niwing Sheets 
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1 

SPEED-VARIABLE AUDIO PLAY-BACK 
APPARATUS 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a speed-variable audio 
play-back apparatus, and more particularly to a speed- 
variable audio play-back apparatus capable of playing back 
audio stored in a storage medium at an adjusted speed while 
preventing any degradation in tone color or loss of audio 
signals from occurring upon varying the play-back speed 
while the audio (speech) is played back by an audio play- 
back ai^aratus such as a tape player. VTR, multimedia 
equipment, computer and the like, so that the audio (speech) 
being played back can be heard as when a person speaks 
quickly or slowly. 

2. Desaiption of Related Art 

In tape or video players, generally, the tone color of the 
audio varies when the play-back speed varies. When play- 
back is carried out at a high speed, the audio being played 
back is different from its original audio level, and it is heard 
as a "peep-peep" sound. At a low play-back speed, a sound 
typically called "loosened tape sound", is generated. 

As a conventional method for preventing such 
phenomena, Japanese Patent Laid-open Publication No. 
Hcisei 4-168499 (Jun. 16. 1992) discloses a method for 
partially playing back audio (speech) signals read by a 
memory buffer. In accordance with this method, when the 
play-badk speed is doubled, audio (speech) signals read by 
the memory buffer are partially played back such that only 
one of its (wo successive time-slices is played back. 

For example, if the phrase. "I go to school with Jane". Is 
played back at a double speed using the above-mentioned 
conventional method, components of the original audio 
respectively corresponding to the shaded portions shown in 
FIG. 1 are eliminated, so that only the speech "I to witti 
Jane" is played back. 

Since the conventional method plays back only part of flie 
speech at a higher play-back speed so as to keep the tone 
color of the speech, the original meaning of the speech is 
lost. As a result, it is very difBcult to understand the meaning 
of the speech using the conventional play-back apparatus. 
Furthermore, it makes listeners feel uncomfortable. 

SUMMARY OF THE INVENTION 
Therefore, an object of the invention is to solve the 
above-mentioned problem and to provide a speed-variable 
audio play-back apparatus capable of playing back audio 
stored in a storage medium at an adjusted speed while 
preventing any degradation in tone color and loss of audio 
signals from occurring upon varying the play-back speed 
while the audio (speech) is played back by an audio play- 
back apparatus such as a tape player, VTR, multimedia 
equipment, con^juter and the like, so that the audio (speech) 
being played back can be heard as when a person speaks 
quickly or slowly. 

In accordance with the present invention, this object is 
accomplished by providing a speed-variable audio play- 
back apparatus conrprising; a pitch detecting circuit for 
separating speed) source con^wnents and audio character- 
istics from an input audio signal; a pitch modulating unit foe 
deleting selected ones of the separated speech source com- 
ponents or adding another speech source component to the 
separated speech source components depending on a play- 
back speed, thereby adjusting the length of the audio signal 
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to be played back: a speech synthesizing circuit for synthe- 
sizing the speech source components and audio character- 
istics modulated by the pitch modulating unit, thereby 
outputting a speed-varied audio signal; and a main controller 
5 for controlling the circuits and unit in accordance with 
control signals externally applied thereto, respectively 

It is preferred that the pitch detecting circuit be provided 
with an analog/digital converter for converting the analog 
input audio signal into a digital audio signal so that the pitch 
ic detecting circuit detects pitch portions of the audio signal in 
a digital manner. 

It is also preforred diat the speech synthesizing circuit be 
provided with a digital/analog converter for converting the 
audio signal, ccnversion-processed in a digital manner, into 
IS an analog signal. 

Preferably, the apparatus further comprises a memory unit 
for tempcHTwily storing the initial audio signal and sending 
the stcg-ed audio signal to the speech synthesizing circuit so 
that the audio signal is compared with the modulated audio 
20 signal synthesized by the speech synthesizing circuit. 

It is also preferred that the apparatus further comprises a 
command memory circuit for storing various control signals 
required for a speed-varied audio play-back, receiving con- 
trol signals from the main controller and outputting the 
23 stored control signals respectively based on the received 
control signals. 

It is also prefened that the pitch detecting circuit extracts 
the speech source components on the basis of the following 
equation: 

c{m, 8) = V Kii + Km- I)) -J<<« + O" + 6)1 



x(n): die original input signal (the amount of speech on a 

time axis n); 
tm: the position of die m-th speedj source; and 
S: a tolerance region around tm. 
40 Preferably, the pitch modulating unit performs a signal 
modulation by applyiog a window function, which provides 
a required si^ial length extending from the position of each 
speech source, to a portion of the audio signal oorrespCHidIng 
to eadi audio signal characteristic as expressed by the 
45 following equation: 

viiere, 

x„(n) : the modulated audio signal: 
50 h„(n): the window function; 

t„: the position of each speech source; and 
x(B): the iqmt audio signal (the amount <rf ^edi on a 
time axis n). 

Preferably, the speech source synthesizing circuit derives 
a speed-vanedspeedi signal by use of the modulated q>eed) 
source components and audio signal characteristics as 
expressed by the following equation: 

^ £ aijx^n)h^(.U) - n) 



65 x(n): die speed-varied speech signal; 

oq: a variable for adjusting the amount of synthesized 
speech; 
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xq(n): the modulated audio characteristics (xq( ii)=x„( n- 
5qW; 

tq: the position of each modulated speech source; and 
5q: a variable for determining the play-back speed; ^ 

BRIEF DESCRIPTION OF THE DRAWINGS 

Other objects and aspects of the invention will become 
apparent from the following description of embodiments 
with reference to the accompanying drawings in which: lo 

FIG. 1 is a diagram for explaining a convenu'onal speed- 
variable speech play-back system: 

FIG. 2 is a block diagram schematically illustrating a 
speed-variable audio play-back apparatus in accordance 
with the present invention; 

FIG. 3 is a block diagram illustrating a speech production 
model, applied to the present invention, in the form <rf an 
electronic circuit; 

FIG. 4 is a flow chart illustrating signal processing 
procedures respectively executed by main parts of the speed- 
variable audio play-back apparatus shown in PIG. 2; 

FIG- 5 is a waveform diagram showing waveforms of 
speech sources and audio characteristics separated in an 
analyzing procedure executed by the speed-variable audio 
play-back apparatus of FIG. 2; and 

FIG. 6 is a waveform diagram showing a procedure for 
modulating the speech source by the speed-variable audio 
play-back apparatus of FIG. 2. 

DETAILED DESCRIPnON OF THE 
PREFERRED EMBODIMram 

FIG. 2 is a block diagram schematically illustrating a 
speed-variable audio play-back apparatus in accordance 
with the present invention. 

As shown in FIG. 2. the apparatus includes an analog/ 
digital (A/D) converter 1 connected to an audio-in Line and 
a program bus. An input buffer 2 is connected to die A/D 
converter 1. The input buffer 2 is also coupled to a data bus 
as well as the program bus. The apparatus further includes 
a pitch detecting circuit 3 connected to both the input buffer 
2 and the program bus. a pitch modulating unit 4 connected 
to both the pitch detecting circuit 3 and die program bus. and 
a speech synthesizing circuit 5 connected to fiie pitch 
modulating unit 4. The speedi synthesizing circuit 5 is also 
coupled to both the program bus and the data bus. The 
apparatus also includes an output buffer 6 connected to both 
the speech synthesizing circuit 5 and the program bus. a 
digital/analog (D/A> converter 7 connected to both the 
output buffer 6 and the program bus, a main controller 8 
connected to the program bus. a command memory circuit 
9, such as read only memory (ROM), connected to die main 
controller 8. and a memory unit 10, such as random access 
memory (RAM), connected to both the program bus and the 
data bus. 

The main controller 8 serves to control the overall system 
of the speed-variable audio play-back apparatus. Command 
languages required to control various parts by die main 
controller 8 are stored in the command memory circuit 9. On 
the other hand, audio data is stored in the memray unit 10. 

Transfer of control signals and transfer of data among the 
blocks are carried out by die program bus and data bus. 
respectively. The program bus serves to transfer a command 
from the naain controller 8 to a part to be controlled. The data 
bus serves to receive audio data from the input buffer 2 and 
to ten^rarily store the received audio signal. Upon a 
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speech synthesis, the data bus sends the stored audio data to 
the speech synthesizing circuit S so that the audio data can 
be re-synthesized with a modulated speech source signal in 
the speech synthesizing circuit 5. 

Operation of the speed-variable audio play-back appara- 
tus having the above-mentioned arrangement according to 
the present invention wiU now be described. 

TTie system used in the apparatus according to the present 
invention is based on a speech production model which 
simulates a speaker's vocal organ. In accordance with the 
speech production model, the audio is determined by an 
audio transfer characteristic obtained by a speedi source, 
which is an audio production source, and an articulation 
organ such as a tongue, a lip. or teeth. 

In accordance with the speech production model, a flow 
of air emerging from the speaker's lungs generates periodic 
or noisy air vibrations in "a narrow space" defined in the 
voice cord or wal cavity by the tongue, lip. or teeth; that is. 
at the point of articulation. These air vibrations become a 
speech source. The frequency coi^ponent of flic speech 
source is selectively resonated by the influence of the audio 
transfer characteristic determined by die articulation of an 
wgan positioned above the voice cord, namely, the vocal 
jj tract, thereby producing speech. 

Rrfeiring to FIG. 3. such a speedi production model is 
schematically shown in the form of an electronic circuit. 

This system shown in FIG. 3 is based on the above- 
mentioned speech fnoduction model, As shown in FIG. 4. 
30 the system includes three main parts, namely, an analyzing 
part for separating speech sources and audio characteristics 
from an input signal, a modulating part for jn-ocessing the 
separated signals at the desired play-back speed, and a 
synthesizing part for petforming a signal re-syndiesis using 
3S the modulated signals. 

The modulating part indudes a speech source modulathig 
unit adapted to jvocess the separated speedi source signals 
based on the above-mentioned speech production model, 
and an audio characteristic control unit adapted to perform 
40 a smoottiing process using a wfaidow function needed fw the 
re-syndiesis while maintaining the tone coIchc, namely, the 
audio characteristia 

The overall operation of this system is constituted by 
procedures of analyzing an input audio signal to vary the 
play-back speed while stiU maintaining the tone color or 
frequenqr of the audio signal, separating speech sources and 
audio characteristics from the audio signal based on the 
result of the analysis, processing the separated data at a 
varied play-back speed, and performing a signal re-synthesis 
50 using the processed data. These procedures are best shown 
in FIG. 4. 

HO. 4 shows signal processing procedures respectively 
carried out by the main parts of the speed-variable audio 
play-back af^atus shown in FIG. 2. 

The analyzing, modulating and synthesizing parts, which 
are the most inqportant parts of the present invention, cot- 
respond to the pitch detecting circuit 3, the pitch modulating 
unit 4 and die speech synthesizing circuit S. respectively, 
gj, The above procedures will now be described in more 
detail in conjunction with die apparatus shown in FIG. 2. 

Once an analog audio signal is input, it is converted into 
a digital signal by the A/D converter I and then sent to die 
pitch detecting circuit 3 via die input buffer 2. 
6S In the procedure executed by die analyzing part, die futch 
detecting circuit 3 separates die audio signal into a p<»tion 
corresponding to the speech sources and a portion coire- 
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sponding to the audio signal characteristics based on the 
speech production model under the control of the main 
controller 8. The pilch detecting circuit 3 processes the 
separated portions of the audio signal individually. 

In order to derive the position of each speech source from 
the audio signal in this case, a cross-anqilitude difference 
c(m.5). which is indicative of a measured signal difference 
between the (m-l)th speech source and the mth speech 
source within a tolerance range 5 is defined by the following 
equation (1): 

rim, 8) = '^S ' lx(a + Km - 1» - 4*i + (M + 8)1 



x(n): a 



I original input signal (the at 
" n); 



tm: the position of the m~th speech source; and 

8: a tolerance region around Un 

The CTOSs-amplitude difference is defined as a measure of 
the similarity between signals by measuring the difference 
between signals using positions of adjacent speech sources 
as reference points. 

AccOTdingly. the position of the m-tfa speedi source is 
deteimined as the position tm where the cross-amplitude 
difference is minimized. As this procedure is repeatedly 
executed for input signals, the speedi source conqionents 
can be extracted. 

FIG. 5 shows waveforms of die speech sources and audio 
characteristics separated in the procedure executed by the 
analyzing part. 

Referring to FIG. 5. it can be seen that general audio 
signals have substantially similar characteristics in quasi- 
stationary time intervals, namely, neighboring short time 
intervals. The longest signal interval involving similar signal 
characteristics is typically called "one pitch". In the proce- 
dure executed by the analyzing part, a pilch interval of the 
speech source signal is extracted from the input audio signal 
so that it can be used to adjust die audio play-back speed. 

The moduladng part executes a procedure for modulating 
the speech source signal and audio characteristic signal 
separated in the above-mentioned analyzing procedure. In 
this regard, the pitch modulating unit 4 includes a speech 
source modulating unit for processing the speech source 
signal, and an audio characteristic confrol unit for executing 
a smoothing procedure based on toe window function 
needed for a re-synthesis while maintaining toe tone color, 
namely, the audio diaracteristic. 

The speedi source modulating unit of the pitch modulat- 
ing unit 4 deletes or adds toe speech source conqionent 
extracted from the audio signal dq>ending on toe play-back 
speed, thereby adjusting toe lengto of the audio signal. This 
will be described in more detail in conjunction wito FIG. 6. 

FIG. 6 illusfrates an example of toe procedure for modu- 
lating the speech sources by the speed-variable audio play- 
back apparatus shown in FIG. 2. 

Where toe audio play-back speed is to be decreased, 
additional speech sources are added while still maintaining 
toe interval of neighboring speech sources, toereby length- 
ening audio signals. On the other hand, a doubling of toe 
audio play-back speed is achieved by selecting every otoei 
speech source while still maintaining toe interval of neigh- 
boring speech sources and re-syntoesizing toe selected 
speech sources using toe audio diaractcristic. 

The audio characteristic control unit of toe pitch modu- 
lating unit 4 performs a signal modulation by aj^lying a 



window function, which provides a certain signal length 
extending from the position of each speech source, to toe 
audio signal portion conesponding to each audio signal 
diaracteiistic as indicated by the following equation (2): 

' ^Jn>kjt„-«)xtn) (2) 

where. 

x„(n): a modulated audio signal; 
h„(n): toe window function; 
10 t„ : toe position of each speech source; and 

x(n): an input audio signal (toe amount of speech on a 
time axis n). 

This procedure produces a smooth audio signal even 
when a signal modulation has been made by a deletion or 

•5 addition of qieech sources by a speech synthesis toat wiU be 
described hereinafter. 

FinaUy. the speech source synthesizing circuit S, which 
executes a synthesizing procedure, derives a speed-varied 
speech signal x(n) by utilizing toe speech source con^nent 

20 and audio signal characteristic modulated in toe modulating 
procedure. The derived speech signal x(n) can be expressed 
by the following equation (3): 

(3) 



otq: a variable fcr adjusting toe amount of syntoesized 
30 speech; 

xq(n): a modulated audio characteristic (xq(n>-x„(n-5q) 

); 

tq: toe position of each modulated speedi source; and 
8q: a variable fot determining the play-back speed. 
35 The speed-varied speech signal x(n) is sent to toe D/A 
converter 7 via toe output buffer 6. In toe D/A converter 7. 
toe speech signal x(n) is converted into an analog signal 
which is. in turn, output as an audio-out signal. 
Where audio is played back using toe above system, it can 
40 be heard as when a person speaks quickly or slowly even 
when toe play-back speed is varied because toe tone color of 
toe speech being played back is maintained. 

When videos are monitored or retrieved by high speed 
play-back in a VTR player, it is possible to obtain a 
4S played-back speedi while maintaining toe original tone 
color, as when a person speaks quickly or slowly, witoout 
causing listeners to fedi uncomfortable by variations in tone 
color or loss of audio signals, boto of which occur in existing 
VTR players. 

50 The present invention is also suitable for high-speed 
scanning in multimedia equipment This technique will 
become more widely used as the growto of toe multimedia 



As is ai^»arent ftom toe above description, toe present 
55 invention provides a speed-variable audio play-back a|^a- 
ratus capable of playing back audio or speech stored in a 
storage medium at an adjusted speed whUe preventing any 
degradation in tone color and loss of audio signals from 
occurring upon varying toe play-back speed while toe audio 
60 or speech is played back by an audio play-back apparatus 
such as a tape player, VTR. multimedia equipment, com- 
puter and toe like, so toat toe audio (speech) being played 
back can be heard as when a person speaks quickly or 
slowly. 

65 Such effects of toe present invention are useful in fields 
assodated with design, manufacture and sale of various 
audio play-back apparatus. 



Although the preferred embodiments of the invention 
have been disclosed for iOustrative purposes, those skilled in 
the art will appreciate that various modifications and addi- 
tions are possible, without departing from the scope and 
spirit of the invention as disclosed in the accompanying 
claims. 

What is claimed is: 

1. A speed- variable audio play-back apparatus compris- 
ing: 

a pitch detecting circuit for separating speech source 
components and audio characteristics from an input 
audio signal; 

a pitch modulating unit for modulating the input audio 
signal by modulating the separated speech source com- 
ponents and the audio characteristics separated by said 
pitch detecting circuit, the separated speech source 
components being modulated by performing one of 
deleting selected ones of the separated speech source 
components and adding at least one of the separated 
speech source components to the separated speech 
source components, depending on a play-back speed, 
thereby adjusting a length of an audio signal to be 
played back; 

a speech syntfaesizing circuit for synfhesizuig the speech 
source conqxjnents modulated by said pitch modulat- 
ing unit and the audio characteristics modulated by said 
pitch modulating unit, thereby producing a speed- 
varied audio signal; and 

a main controller for controlling said pitch detecting 
circuit, said pitch modulating unit, and said speech 
synthesizing circuit in accordance with control signals 
externally applied thereto, respectively 

2. The speed- variable audio play-back apparatus of claim 
1. wherein the pitch detecting circuit is provided with an 
analog/digital converter for converting the input audio signal 
from an analog audio signal to a digital audio signal so that 
the pitch detecting circuit detects pitch portions of the digital 
audio signal in a digital manner. 

3. The speed-variaUe audio play-back apparatus of claim * 
1. wherein the speech syndiesizing circuit is provided with 

a digital/analog converter for converting the speed-varied 
audio signal into an analog signal. 

4. The speed- variable audio play-back apparatus of daim 
1. further con^rising a memory unit for tenq>orarily storing 4 
the input audio signal and sending the stored input audio 
signal to the speech synthesizing circuit so that the audio 
signal is compared with the speed-varied audio signal syn- 
thesized by the speech synthesizing circuit. 

5. The speed- variable audio play-back apparatus of daim 5 
1, further con^jrising a command memory circuit for storing 
various control signals required for producing the speed- 
varied audio signal, receiving control signals from the main 
controller and outputting the stcoed control signals respec- 
tively based on the received control signals. 



6. The speed-variable audio play-back apparatus of claim 
1. wherein the pilch detecting circuit separates the ^eech 
soivce components on the basis of the following equation: 



where. 

xo x(n): the input audio signal (an amount of speech on a 

tm; a position of an m-th speech source; 
6: a tolerance region around tm; 
c(m,5); a cross-amplitude difference. 
' ' 7, The speed- variable audio play-back apparatus of claim 
1, wherein die pitch modulating unit modulates the input 
audio signal by applying a window function which provides 
a required signal length extending from a position of each 
separated speech source component to a pwtion of (he input 
^ audio signal corresponding to each audio signal diaracter- 
istic as expressed by die following equation: 



xjn): the ra 

h„{n); the window function; 

t„: die position of each separated speech source is com- 
ae ponent; and 

x(n): the input audio signal (an amount of speech on a 
time axis n). 

8. The speed-variable audio play-back apparatus of claim 
1. wh«a-dn the speech synthesizing circuit derives the spced- 
35 varied audio signal by use of the modulated separated 
speedi source can^Mnents and the modulated audio signal 
characteristics as expressed by the following equation: 

lcU3xqin)hg'(tq-n) 



x(n): die speed-varied audio signal; 
otq: a variable for adjusting a 

xq(n); the modulated audio diaracteristics (xq(n)=x„(n- 
8q)); 

tq: a position of each modulated separated speetSi source; 
and 

6q: a variable for determining play-back speed. 
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(57) ABSTRACT 

A multimedia decoder unit having error concealment and 
fast muting capabilities. The audio decoder provides error 
concealment using a dynamic recovery delay that is based 
on the error rate of an input digital bitstream and also uses 
frame repeating. The decoder allows fast audio muting 
whereby audio can be muted within two audio frames of a 
mute signal thai immediately freezes the video frame, e.g., 
a channel change. With respect to the dynamic recovery 
delay, a template of fixed length is used to inspect the last 
frames within the template. If error is found, then the error 
sum is used as an index into a table length which provides 
a dynamic template length. Error within the dynamic tem- 
plate length is computed and if larger than a tolerance, the 
current frame is muted. This allows the recovery delay to be 
adaptive and based on the error rate while still allowing mute 
merging. Muting the cuaent frame can be achieved by 
repeating the previous frame but the delay data of the last 
block of the previous audio frame is added to the first block 
of the repeated audio frame lo provide a smooth frame 
interface. In response lo a mule command, the decoder zeros 
the audio output bitstream stream to provide zero frames at 
the audio output buffer (AGE). In addition, the decoder also 
directly zeros audio frames in the AOB that lie between its 
the read and write pointers to guarantee that only two frames 
of audio be played after the mute signal. 

14 Claims, 18 Drawing Sheets 
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DIGITAL AUDIO DECODER HAVING ducer information, directors, actors, etc. The MPEG7 slan- 

ERROR CONCEALMENT USING A dard is currenUy under standardization, and is in draft form 

DYNAMIC RECOVERY DELAY AND FRAME but available. The draft specifications are described in the 

REPEATING AND ALSO HAVING FAST ISO-IEC-JTC1/SC29/WG11 documents. 

AUDIO MUTING CAPABILITIES 5 One problem with using encoded digital audio informa- 

tion is that errors can occur between the transmission and 
reception of the audio data. The decoder unit can detect 
BACKGROUND OF THE INVENTION ^^^^^^ ^ particular frame of the audio data contains error by 

1 Field of the Invention using well known CRC checking schemes. In the past, the 

The present invention relates to the field of multimedia " having the error would be muted by filling in the 

electronic systems. Mot« particularly, the present invention fr™^ with zeros. This is ca led a hard mute. However, the 
relates to an audio decoder unit for decoding digital multi- hard mute, when played back, causes a very audible pop 
media bitstreams representing audio inforaiation. sound which ^ not pleasing to the ear nor does it sound 

natural. Therefore, an attenuation function or "window was 
Z, Kelatea Art ^5 applied to the error frame to soften the mule. However, even 

Audio^isual (AV) material is uioreasmgly stored, trans- ^^^jj j^j^j^^ ^^^^ ^ .^^^„ associated therewith depending 
mitted and rendered using digital data. Digital video repre- window function applied. Also, hard and soft mutes 

sentation of AV material facihtates its usage with computer ^^^^ ^ j^^y^^j „f ^^^^^^ associated therewith that can 

controlled electronics and also facilitates high quality image distinguished by the ear. Therefore, when many error 

and sound reproduction. Digital AV material is typically 20 fr^^es are detected in the same bitstream neighborhood, 
compressed ("encoded") m order to reduce the computer ,|,^^ intermittent durations of silence (mutes) followed by 
resources required to store and transmit the digital data. ITie ^^^^^ (unmule) and silence again (mute) can be very 
systems that transmit multimedia content encode and/or unappealing to the ear and annoying and can also damage 
compress the content to use their transmission channel speaker systems 

efficiently because the size of the multimedia content, espe- ^5 ' ^^ther problem with using encoding digital audio infor- 
cially video, is very large. For instance, m order to more ^^^.^^ .^^^j^^^ ^ commands and audio signal .syn- 
efBciently broadcast or record audio signals, the amount of ^br^nization. For instance, if a user watching a program on 
mformation required to represent the audio signals can be ^ ^ ^^^^^^^ ^^^^^^^ 

reduced. In the case of digital audio signals, the amount of inlbrmadon should stop incident to the channel 

digital infornaation needed to accurately reproduce the ongi- 30 ^j^/ ^^^^ ^^^.^ j.^/^^^ ^^^^ 

nal pulse code modulaUon (PCM) samples can be reduced ^^^^^^j ^ Hov^tv^:, in conventional systems, 

by applying a digital cotnpression process, such as AC3 for ^^^.^ ^^.j^^,^ ^^^^j, ^^^^^^^ ^^^.^ ^^(^ 
instance, resu ting in a digitally compressed representation ^^^^ ^^^^^^ ^ ^^^^^^^^ 
of the original sample. ^^^^ ^j^^^^j ^ ^^^^^^ ^^^j^ ^.g^^l [j^g 

Digital AV material can be encoded using a number of 35 gj^yg ^^^^y encoding schemes. Also, the amount of 
well known standards including, for example, the AC3 audio playback time in a video frame may not be exactly the same 
standard, the DV (Digital Video) .standard, the MPEG as in a video frame in many encoding standards. Therefore, 
(Motion Picture Expert Group) standard, the .TPEG standard, ([^^ ^^^^^ j.-^^^ ^yg^ ^^^^^^ „g g^actly synchro- 
Ihe H.261 standard, the H.263 standard and the Motion nized in the decoding and playback processes. Secondly, the 
JPEG standard to name a few. The encoding standards also 40 channel change operation takes some time to complete 
specify the associated decoding processes as well. The because the AV system needs to parse the bitstream from the 
multimedia contents are typically stored on the storage channel and feed the data to corresponding audio and 

media and are transmitted as bitstreams which represent decoders. This results in a situation where the audio 

audio for video frames. In particular, the ATSC digital ^^^j ^ slightly delayed during decoding and play- 
terrestrial transmission standard adopts the ACS format for 45 ^^^^^ jhe decoder receives a mute command, it is able 
audio encoding and the MPEG2 format for video encoding. immediately freeze the video frame, because the video 

MPEG is the compression standard for audio, video and signal the master. However, many decoded audio frames 
graphics information and includes, fiar example, MPEGl, 2, be stored in the output buffer, resulting in some audio 

4 and 7. It is standardized in the ISO-IEC/JTC1/SC29/ playback after the video freeze. This is very noticeable to the 
won documents. MPEGl is the standard for encoding 50 ear and confusing because the audio playback coincides with 
audio and video data for storage on CD-ROM devices video frames that are not displayed simultaneously, 
(compact disc read only memory). The MPEGl specifica- r,T,i,,„«Ar.-i, r^r T^ir. .ivT^rrvmrMvr 

ton is described in the IS-11393 standard. MPEG2 is the SUMMARY OF IHE INVENTION 

standard (adopted for ATSC) for encoding, decoding and Accordingly, the present invention provides an audio 
transmitting video data for storage media, e.g., DVD (digital 55 decoder unit that merges nearby muted ("error") frames to 
video disc), etc., and also for digital broadcasts. MPEG2 extend a silence period between the error frames when the 
supports interlaced video while MPEGl does not. Therefore, error rale is high. By extending the silence period, a more 
MPEG2 is used for high quality video displaying on TV natural and less annoying sound results when the bitstream 
units. The MPEG2 specification is described in IS-13818. includes many nearby errors. This mute merging can be 
The MPEG4 standard is used for encoding, decoding and 60 accomplished using a dynamically adjusted recovery delay 
transmitting audio, video and computer graphics data. It period that is adaptive based on the error rate. By extending 
supports content based bitstream manipulation and repre- the recovery period, mutes are merged, e.g., non-error 
sentation. The specification is described in 1S14496. frames are muted to provide a longer mule duration. 'ITie 
MPEG7 is the standard of the meta information of multi- present invention also applies a novel frame repeating 
media (MM) contents. The example of the meta data is data 65 technique for frame muting to conceal single silent frame 
is describes or is related to the MM contents, such as, periods without "pops" or other audio artifacts that result 
identification and/or other descriptions of the author, pro- during single frame muting. In addition, the present inven- 
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tion provides an effective mechanism for guaranteeing that 
only two audio frames are played back incident to a mute 
command. This provides a better sounding channel change. 

In one embodiment, the present invention provides a 
multimedia information decoder unit having error conceal- 
ment and fast muting capabihties. The audio decoder pro- 
vides error concealment using a dynamic recovery delay that 
is based on the error rate of an input digital bitstream and 
also uses frame repeating. The decoder also allows fast 
audio muting whereby audio can be muted within two audio 
frames of a mule signal that immediately freezes the video 
frame, e.g., for use in a channel change situation. With 
respect to the dynamic recovery delay, a template of fixed 
length, e.g., 24 audio frames, is used to inspect the last 
frames within the template. If error is found in this fixed 
template, then the error sum is used as an index into a table 
length which provides a dynamic template length. Error 
within the dynamic template length is then computed and if 
larger than a prescribed tolerance, the current frame is 
muted. This allows the recovery delay to be adaptive and 
based on the error rate while still allowing mute merging. 
When muting is performed, smoothed muting can be used in 
one embodiment and in another embodiment, firame repeat- 
ing can be performed, 

In cases when only one bad frame appears within a 
neighborhood of otherwise good frames, a single frame 
mute can be performed. In accordance with the present 
invention, muling the current frame can also be achieved by 
repeating, in the lime domain, the previous frame. In single 
frame muting cases, the delay data of the last block of the 
previous audio frame is added to the first block of the 
repeated audio frame to provide a smooth interface between 
the repeated frame. Before the addition, data reordering and 
weighting are performed. In response to a mute command 
(e.g., incident to a channel change), the decoder zeros the 
audio output bitstream stream to provide zero frames at the 
audio output buffer (AOB). In addition, the decoder also 
directly zeros audio frames in the AOB that lie between its 
the read and write pointers to guarantee that only two frames 
of audio be played after the mute signal. 

More specifically, A first embodiment of the present 
invention is drawn to a method for muting a portion of an 
encoded bitstream of audio information comprising the steps 
of: a) with respect to a current encoded audio frame of the 
encoded bitstream, computing a length of a dynamic tem- 
plate based on an error rate of the encoded bitstream, the 
dynamic template encompassing a plurality of previous 
encoded frames of the encoded bitstream; b) summing errors 
of the plurality of previous encoded frames within the 
dynamic template to produce a first error sum; c) determin- 
ing if the first error sum exceeds a prescribed tolerance; and 
d) adaptively merging muted error frames by muting the 
current encoded audio frame provided the first error sum 
value exceeds the prescribed tolerance whether or not the 
current encoded audio frame has an error. A variation of the 
first embodiment further includes a method as described 
above wherein the step a) comprises the steps of: al) with 
respect to the current encoded audio frame, summing errors 
of a plurality of previous encoded frames encompassed by a 
fixed-length template to produce a second error sum; and a2) 
using the second error sum as an index to a look-up table to 
compute the length of the dynamic template. 

A second embodiment of the present invention includes a 
method for muting a portion of an encoded bitstream of 
audio information comprising the steps of: a) detecting if a 
current encoded audio frame of the encoded bitstream 
contains an error; and b) provided an error is detected. 
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repeating a previous decoded audio frame in lieu of the 
current encoded audio frame, the step b) comprising the 
steps of; bl) obtaining decoded data of the previous audio 
frame; b2) generating a repeated audio frame by replicating 

5 the decoded data of the previous audio frame for use in lieu 
of the current encoded audio frame; b3) modifying the 
repeated audio frame by adding delay information of a last 
block of the previous audio frame with pulse code modu- 
lated (PCM) data of a first block of the repeated audio frame 

10 to generate new decoded data for the first block of the 
repeated audio frame; and b4) sending the repeated audio 
frame to an audio output buffer for playout. 

A third embodiment of the present invention includes a 
method (within a digital decoder) for reducing audio frame 

IS over-run comprising the steps of: a) responsive to an audio 
mute signal, causing an input audio encoded bitstream to 
zero, the step a) causing entries in an audio output buffer to 
zero starting from an entry position pointed to by a write 
pointer associated with the audio output buffer; and b) 

20 directly zeroing a plurality of entries of the audio output 
buffer in response to the audio mute signal, the plurality of 
entries being a few entries away from a read pointer of the 
audio output buffer, the read pointer following the write 
pointer and wherein as a result of step a) and step b), only 

25 a predetermined number of audio output frames are guar- 
anteed to be played after the audio mute signal is detected. 
BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. lA and FIG. IB illustrate an exemplary multimedia 

30 communication system including a transmission system 
having an encoder and a reception system having a decoder 

FIG. 2 illustrates an exemplary encoded audio frame of an 
encoded digital audio bitstream. 
35 FIG. 3A is a block diagram of an audio decoding system 
in accordance with the present invention. 

FIG. 3B is a block diagram of an audio decoding system 
having an error concealment circuit in accordance with one 
aspect of the present invention. 

FIG. 4 illustrates steps in a process in accordance with one 
embodiment of the present invention for providing a 
dynamic error recovery delay with mute merging. 

FIG. SA illustrates a portion of the encoded digital audio 
stream and the fixed template used in accordance with the 
'^^ embodiment of the present invention shown in FIG. 4. 

FIG. 5B illustrates a portion of the encoded digital audio 
stream and the dynamic template used in accordance with 
the embodiment of the present invention of FIG. 4. 

FIG. 6 iUuslrates an audio signal in accordance with a 
muled audio frame and a smoothing window function 
applied thereto. 

FIG. 7 illustrates steps in a process in accordance with an 
embodiment of the present invention for perform frame 
repeating for a single frame muting operation. 

FIG. 8 illustrates a portion of the encoded audio bitstream 
having a single frame to be muted in accordance with the 
embodiment of the present invention of FIG. 7. 

FIG. 9 illustrates the portion of the encoded audio bit- 
go stream of FIG. 8 after frame repeating in accordance with 
the embodiment of the present invention of FIG. 7. 

FIG. 10 illustrates steps in a process in accordance with 
an embodiment of the present invention for reducing the 
number of audio frames played back following an audio 
65 mute command. 

FIG. 11 is a block diagram of a decoder unit in accordance 
with the embodiment of the present invention of FIG. 10. 
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FIG. 12 illustrates the contents of the audio output buffer lA and FIG. IB together illustrate a multimedia 
of an audio decoder system, cation system 10. In accordance with the system 10, a 
FIG 13 iUustrates the contents of the audio output buffer muhimedia encoder system 120 accepts an audio signal 
of an audio decoder system after frame zeroing in accor- (e.g., PCM audio) 115 and produces an encoded bitslreara 
dance with the embodiment of the present invention of FIG. s 122 based thereon. In one embodiment, this is an AC3 
jQ formal compliant encoded signal which has a trequency oi 
FIG. 14 illustrates a timing diagram of signals involved in 384 kilobytes per second. This encoded bitslream is pro- 
the embodiment of the present invention of FIG. 10. fussed by transmission equipment 130 to produce a modu- 
FIG. 15 is a block diagram of a computer system platform lated signal 124 which can be transmitted 140, e.g.. by a 
on which the error concealment and muting embodiments of lo ^^^^^^^'^ '^'^ °' "'^^^ '^''''^^^ "^^^^ '5'^^'°'; 
the present invention can be practiced. FIG. IB illustrates the receiving system which mcludes 
ncTAii cn nBQPBiPTinN nn thf receiver hardware 142 (satellite dish, cable receiver, etc.) 
DETAILED DESCRff™ OF THE reception equipment 132 capable of converting the 
INVbPJJlUlN received modulated signal 146 to an encoded digital bit- 
In the following detailed description of the present stream 134. It is appreciated that encoded bitstteam 134 may 
invention, a digital audio decoder system for a multimedia ^^ry from encoded bitstream 122 as a result of one or more 
information system having improved error concealment signal errors that can be a result of transmission/reception 
functionality, improved muting capabilities and reduced problems. The encoded bitstream 134 is then fed to a digital 
audio frame overrun in response to a mute-command, decoder unit 200 which generates an output signal 150 that 
s specific details are set forth in order to provide a pan be fed to a speaker system for rendering audible signals. 



thorough understanding of the present invention. However, ji,g decoder system 200 of the present ir 

it will be recognized by one skilled in the art that the present g^^^r concealment circuitry 210 for proces.sing audio frames 

invention may be practiced without these specific details or (jj^j ^^^^ signal errors therein. 

with equivalents thereof. In other instances, well known pjQ 2 illustrates a frame 230 of the encoded audio signal 

methods, procedures, components, and circuits have not ^ 134. An AC3 serial coded audio bitstream 134 is made up of 

been described in detail as not to unnecessarily obscure ^ sequence of audio synchronization frames ("frames"), 

aspects of the present invention. g^j.^ fj^^^ consists of 6 coded audio blocks (AB) 

Notation and Nomenclamre 216o-216/ each of which represent encoded data of 256 new 

Some portions of the detailed descriptions which follow »"dio samples The samples are made at 48 J^;*- When 

^.j. , t „ „ J Ki„,.v= 30 decoded, each frame 230 represents 32 ms of playback time. 

L^deSi ilTnd'^.^^^^^^ to acquire and mahitain synchronization. Ab.s^^^ 



These descriptions and .presentations are the means ^ed irrSs"21 = 

by those skilled m the data processing arts to most effec- ^ ' • .u j a a- tu^ 

■ 1 ,u I. , r.u ■ . 1, i„ ;„ is tains parameters descnbing the coded audio service. The 

ively convey the substance of heir work to ote^^ 35 / ^^^^^ ^16 can be followed by an auxiliary data 

the art. A procedure, computer executed step logic block ^^^218. At the end of each frame is an error check field that 

process, e C, is here, and g^";!!/' ^ includes a CRC (cycHc redundancy check) word 220 for 

selt-consisten sequence of steps or mstrucUons leading o ^ ^ CRC word located in the SI 

^Zl^Jt^^^Si^^ ^ >^ optional and can be included within each 

necessarily, these quantities take the form of electrical or , ^ . 

magnetic signals capable of being stored, transferred, "t^' 3A is a logical block diagram of an decoder system 

combined, compared, and otherwise manipulated in a com- m accordance with an embodiment of the present invention, 

puter system. It has proven convenient at times, principaUy deccKler unit 200 receives an encoded audio bitstream 

for reasons of comrnon usage, to refer to these signals as 45 134 and forwards decoded audio frames to an audio output 

bits, values, elements, symbols, characters, terms, numbers. buffer AOB 250 The AOB 250 contains several decoded 

or the like frames, some of which are required as a result of audio- 

It should be borne in mind, however, that all of these and ^^^^ ^ read pointer marks the memory position at 

similar terms are to be associated with the appropriate which audio frames are removed f om the AOB 250 and sent 

physical quantities and are merely convenient labels applied ^ ^''t ^ '° "'^ 'J*'^" T""" t^T ^ 

puysn,ai 4umiuiiit= auu in y . . , . marks the memory position where new audio frames are 

to these quantities. Unless specifically stated otherwise as ■ , c .u j j •. inn a k,.fK., 

apparent from the following discussions, it is appreciated ^^'^^^ f"" f"^ iS^^^IoR 

thTt throughout the present invention, discussions utilizmg f ''^ ^^^^ '° ^f'^ , , 

terms such as "processing" or "computmg" or "translating" FIG- 3B illustrates a more detailed view of the decoder 

or "calculating" or "determining" or "scroHing" or "display- 55 umt 200. Decoder unit 200 contains a parser 270 template 

ing" or "recognizing" or the lilce, refer to the action and processing umt 280, a decoder processmg unit 205 and a 

processes of a computer system, or similar electronic com- mute/bypass processing unit 290^ It is appreciated that the 

puting device, that manipulates and transforms data repre- components of the decoder unit 200 can be reahzed usmg 

sented as physical (electronic) quantities within the com- hardware circuitry or can be realized using software. The 

puter system's registers and memories into other data 60 decoder processmg unit 205 and the rnute/bypa^ 

Lilarly represented as physical quantities within the com- ""'t 290 both are coupkd to the AOB 250. The parser 270 

puter system memories or registers or other such informa- s^^n^ 'he input encoded bitstream 134 (which can originate 

tion storage, transmission or display devices. ^"^lo code buffer 260) In one embodmienl. the 

^ input bitstream 134 is comphant with the ATSC standard 

Audio Decoder System 55 wb.ich includes an AC3 encoded bitstream for audio infor- 
Embodiments of the present invention are directed to a mation. The template processing unit 280 determines 

digital audio decoder system 200 as shown in FIG. IB. FIG. whether or not a current frame is to be muted and therefore 
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is an error concealment circuit. Template processing unit 
280 functions in accordance with the steps of FIG. 4. If a 
particular frame is to be muted, that frame is called an error 
or mute frame, and is processed by the mute/bypass pro- 
cessing unit 290. 



FIG. 4 illustrates steps in a process 280 in accordance 
with one embodiment of the present invention for providing 
a dynamic error recovery delay with mute merging. Process 
280 can be realized in hardware or it can be realized in 
software. As software, process 280 is realized as instruction 
code executed by system 112. (FIG. 15). Process 280 
operates by muting some non-error audio frames in order to 
merge two or more error firames into one longer silence 
period. This reduces the amount of annoying intermittent 
silence periods followed by sound and silence again in cases 
when the error rate is high. In this embodiment of the present 
invention, the length of the recovery delay is adaptive and 
depends on the amount of accumulated errors found in the 
input bilstream 134 (FIG. 3B). 

At step 305, a digital audio encoded frame is received by 
the decoder 200 from an input bitstream 134. An exemplary 
input bitstream 134a is shown in FIG. 5A and includes 
encoded frames 22-50. It is appreciated that each encoded 
frame also includes a corresponding array entry of error 
array 370. The individual entries of error array 370 are one 
bit in length and specify whether or not the encoded frame 
associated with the entry contains an error. In one 
embodiment, a "1" indicates an error and a "0" indicates no 
error. For instance, entry 370a corresponds to frame 22 and 
indicates a good frame. Entry 3706 corresponds to frame 23 
and indicates an error frame while entry 370c corresponds to 
frame 24 and indicates a good frame. 

The error entries of error array 370 can be computed and 
stored by the parser process 270 of the decoder 200. There 
are several ways in which the AC3 data can indicate that 
errors are contained within a frame of encoded data. In one 
method, the decoder 200 can be informed of the error frame 
by the transport system which delivers the data. The data 
integrity can also be checked using the embedded CRC 220 
fields for each encoded frame. Methods for using the CRC 
fields of an encoded frame for error detection are well 
known. Also, well known consistency checks on the 
received bitstream 134 can also be used to indicate that 
errors are present in a particular encoded frame. It is 
appreciated that at step 305 of FIG. 4, any of a number of 
well known processes can be used for generating the error 
array 370 of FIG. 5A based on the input bitstream 134. In the 
example of FIG. 5A, the next audio encoded frame that is 
being processed at step 305 is frame 48. AH other frames of 
lesser frame number to frame 48 have already been pro- 
cessed by step 305 and are therefore previous frames. 

At step 310 of FIG. 4, a first error sum value (sum_ 
errorl) is computed based on the error array entries of the 
last previous Y frames that were processed by step 305 
including the current frame (e.g., frame 48). In one 
embodiment, the value of Y is a constant and can be selected 
based on a number of different considerations. In one 
implementation, Y=24. Using this example, the first error 
sum value is therefore computed based on the error entries 
of the error array 370 for frames 25-48. Afirst error template 
360 is shown in FIG. 5A and includes the error entries of the 
last 24 frames that were processed by step 305. The first 
error template is called the static or fixed error template 



because its frame number is constant. The flr.st error sura 
value is therefore the summation of all error entries that He 
within the error template 360. It is appreciated that the first 
error template 360 moves along with the decoding process 
5 as new frames are processed by step 305. That is so say, the 
frames contained in the first error template 360 become 
updated when a new frame is processed by step 305. For 
instance, when frame 49 is the next processed frame, the 
frames of the error template 360 will include frames 26-49 
10 and so on. It is appreciated that if the current frame contains 
an error therein, then the first error sum value (sum_errorl) 
will always be greater than zero because the current frame 
is always included within the first error template 360. 
At step 315, if the first error sum (sum_errorl) is greater 
15 than zero, then step 325 is entered otherwise step 320 is 
entered. At step 320, no error was detected in the first 
template 360, therefore no muting operations are required 
and normal decoding can occur on the current frame. At step 
320, a normal decode process is performed on the current 
20 frame (e.g., frame 48). In other words, no muting functions 
are applied to the current frame and decoding processes 205 
(FIG. 3B) are applied to the current frame. After the decod- 
ing processes, the decoded frame is placed into the audio 
output buJIer 250 at the position of the write pointer and 
25 eventually played out. Process 280 then returns to step 305 
to obtain and process the next encoded frame. 

At step 325, errors are detected in the first template 360 
and muting operations need to be executed. At step 325, the 
value of the first error sum (sum_errorl) is used as an index 
30 into a lookup table called the "length table." Although a 
variety of different length tables can be used, one exemplary 
length table is shown below: 

int lcngthtab[sum_errorlHl. L 1> 1.20, 19, 18, 17, 16, 15, 
14, 13, 12, U, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1) 

35 It is appreciated that the first three entries of the exemplary 
length table do not map to 23, 22 and 21 to avoid very long 
recovery delays. 

The length table provides a length for a second error 
template that is dynamic in size and depends on the error rate 
40 of the input bitstream 134 as determined by the sum_errorl 
value. FIG. 5B illustrates an exemplary portion 134fc of the 
input bitstream and also illustrates an example of the second 
error template 380 that spans from the current frame (frame 
48) and has a length determined by the above length table. 
45 In this example, the length of the second error template is 
five frames long and includes previous frames 44-47 and the 
current frame 48. The second error template is called the 
dynamic or adaptive template because its length is not fixed 
but varies based on the error rate of the input bitstream. 
SO At step 330 of FIG. 4, a second error sura value (sum_ 
error) is then computed based on the summation of the error 
entries of the error array 370 for the frames of the second 
error template 380. In this case, the second error sum value 
is 1+0+1+1+0 or 5. It is appreciated that if the current frame 
55 contains an error therein, then the second error sum value 
(sum„error2) will always be greater than zero because the 
current frame is always included within the second error 
template 380. At step 335, a check is made to determine if 
the second error sum is greater than a prescribed tolerance. 
60 The tolerance amount is programmable and in one 
embodiment, the tolerance amount is 0 and in another 
embodiment the tolerance amount is 1. If the second error 
sum is greater than the tolerance, then errors are found 
within the second error template 380 and at step 345 the 
65 current frame is muted (whether or not the current frame has 
an error therein). After muting, step 305 is entered to obtain 
and process the next frame. At step 340, the second error 
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sum is not greater than the tolerance value and the normal 
decode process is performed on the current frame with an 
applied recovery stage. It is appreciated that step 340 is 
sknilar to step 320 except step 340 includes a recovery stage 



because error concealment, using frame repeating, is barely 
audible in these cases whereas soft muting often creates a 
small audible mute interval. Therefore, in those cases when 
te of the bilstream 134 is not high, e.g., the frames 



because at least oi 

and therefore a recovery from this error is being processed. 
After step 340, step 305 is entered again to obtain and 
process the next encoded frame. 

It is appreciated that if the second error summation is 
greater than the tolerance, then the current frame is skipped lO 
and the output is muted (whether or not the current frame 
contains an error therein), otherwise, the current frame is 
normally decoded and played. In this way, the number of 
transition times from normal play to mute and from mute to 
normal play (unmute) is reduced. In effect, the muting is 
strategy is extended across several non-error frames depend- 
ing on the accumulated error rate so that short routings are 
merged into a long muting. When the error rate is high, 
process 280 acts to merge together adjacent error frames 
(mule merging) by increasing the error recovery delay 20 
period. The amount of mute merging is adaptive and is based 



the first template 360 5 in neighborhood of the 



ir frame have a few tt 



At step 345, a 
be performed to n 



sr of different a 



errors, a single error concealment operation is performed in 
accordance with one embodiment of the present invention. 
A single error concealment operation can be performed by 
repeating the previous non-error frame of the error frame. 
This operation can also be applied to two consecutive frame 
errors that follow a non-error frame. To achieve a smooth 
transition between the repeated frame and the previous 
frame, an overlap-add of the delay of the last block of the 
previous frame and the PCM data of the first block of the 
repeated frame is performed. Also, to achieve a smooth 
transition between the repeated frame and the following 
(next) frame, an overlap-add of the delay of the last block of 
the repeated frame and the PCM data of the first block of the 
following frame is performed. The implementation is per- 
formed in the time-domain rather than the code-domain as a 
result of certain hardware considerations. Frame repeat can 
be performed for two or three frames with errors therein. 
Applying repeating to more than three consecutive error 
frames can create distortion. Therefore, in these cases, the 



ut> umuimvu i« i±.nie the current frame. In the preferred . , . . a 

embodiment, a smooth muting with zeros can be applied to 25 Vroc^^ as described with respect to FIG. 4 

decline the audio signal at a given rate according to a appued. „ ^ j. 

window iTinction and in an alternate embodiment, a frame FIG- ? illustrates steps in a process 440 of one embodi- 
repeal can be performed. FIG. 6 illustrates smooth muting ment of the present mvenUon for repeatmg the previous 
with zeros to reduce the "pop" sounds associated with 
muting. lu 
function 420 



s applied to the decoded audio frame repre- 
sented as signal 410 to decline its amplitude. Windowing 
starts at the zero-cross pomt. The attenuation function rep- 
resents the amount of the original signal 410 allowed to exist 
at any given time and the remainder of the audio signal is 
padded (e.g., replaced) with zeros to provide a mute. 
Smoothing functions and muting using window functions 
are well known. 

The selection of the length of the second template 380 is 
made variable and adapts based on the length table indexed 
by the error occurrence frequency. Under the premise of 
merging intermittent errors over the past frames, the length 
of the second template 380 should be as small as possible to 
minimize error recovery delay. However, two competing 
interests need to be satisfied. On one hand, (a), when the 
length of the second template 380 is lai^e, the benefit is that 
intermittent errors over several frames can be merged into a 
longer mute, but the down side is that error recovery delay 
is longer, On the other hand, (b), when the length is small. 



frame of an error frame to perform error conceahnent. The 
"window" 30 replication is performed in the time domain and special data 
manipulations are performed to produce stnooth signal tran- 
sition at the frame interfaces. Process 440 can be realized in 

hardware or it can be realized in software. As software, 
process 440 is reaUzed as instruction code executed by 
system 112 (FIG. 15). At step 445, a next audio encoded 
frame of information is received from the input bitstreara 
134 and is referenced as the current encoded frame (e.g., 
frame n). A check is made at step 450 to determine if an error 
is present within the current encoded frame. This determi- 
nation can be made by the parser process 270. If the current 
encoded frame indicates that no error is present, then step 
455 is entered where a normal decode of the current encoded 
frame is performed and the decoded audio frame is stored in 
the audio output buffer 250 for playout. Step 455 is analo- 
gous to step 320 of FIG. 4. It is appreciated that the presence 
of an error can be determined at step 450 using the same 
enor detection techniques described with respect to FIG. 4. 

If an error is detected in the current encoded frame, then 
step 460 is entered. At step 460, the decoded version of the 



the down side is that intermittent errors are not merged and 50 previous frame is obtained from the audio output buffer 250 

this causes intermittent sound, but the benefit is that error and the PCM data from blocks 1-5 of the previous frame are 

recovery delay is shorter. To satisfy both of these interests, directly copied and used in place of blocks 1-5 of the current 

the following relationships can be used. To satisfy (a), the encoded frame. 

template length of template 380 plus the sum„errorl should FIG. 8 and FIG. 9 illustrate an example. FIG. 8 illustrates 

be greater than or equal to (Y+1) where Y was the length of 55 a portion 134c of the bitstream including a current encoded 

the fixed template 360. To satisfy (a) and (b), the template frame 512 (having a detected error), a previous decoded 

length of template 380 should be equal to (Y-i-l-sura_ frame 510 (frame n-1) and two next frames 514 (n+1) and 

errorl). These relationships are used to determine the entries 516 (n+2) which remain encoded. At step 460, the PCM 

of the table length lookup table. (pulse code modulation) data associated with blocks 1-5 of 

It is appreciated that process 280, while described with 60 the previous frame 510, e.g., data 5105-510/, are copied^nd 

respect to the AC3 data format, can also be applied to other ' ' 1 - -1-.- - . c 



ts such as MPEG audio, AAC and DV 

Frame Repeat for Single Frame Mutes 
Error concealment can be performed in lieu of soft muting 
in cases where there are only 1 or 2 error frames in a row 



used as the decoded data for the current frame 512. FIG. 9 
illustrates this replacement with the PCM data 513/7-513e 
(of the repeated frame 512') being a direct copy of PCM data 
510ti-510e of the previous frame 510. ITie PCM data for the 
5 previous frame 510 is obtained from the audio output buffer 
250 because this frame 510 has already been decoded by 
decoder 200. As shown in FIG. 9, the resultant modified 
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current frame 512' (now called the repeated frame) contains 
the same PCM data representative of blocks 1-5 as the 
previons frame 510. 

At step 465 of FIG. 7, the delay data (from the delay 
array) associated with the last block (AB5) SlOf of the 
previous frame 510 is obtained and data shuffling is per- 
formed on this delay data. The delay array is a specified data 
structure that for use in decoding next frames and is speci- 
fied by Dolby. FIG. 9 illustrates this delay data as 511. At 
step 470, the PCM data associated with the first block (ABO) 
510a of the pervious frame 510 is accessed. At step 475, well 
known weighting functions are applied to the delay data 511 
and to the PCM data 510a of steps 465 and 470. At step 480, 
the results of the weighting functions are added together (as 
shown in FIG. 9) and stored as the resultant PCM data used 
for the first block 513fl of the repeated firame 512'. Once the 
repeated frame 512' has been fiiUy constructed with PCM 
data 513(2-513/, it is forwarded to the audio output buffer 
250 for playout. The result is that the delay data associated 
with the last block of the previous frame 510 is added and 
overlapped with the first block of the repeated frame 512' to 
smooth out the interface between these frames. Since only 
the decoded PCM data of the previous frame 510 is used 
above, the compressed code of the previous frame 510 is not 
necessary for process 440 and time-consuming decoding 
processes are not used, but rather, what is used is a weighted 
overlap-add function. It is appreciated that if the next frame 
(frame 514) is also in error, the above process 440 can be 
repeated for this next frame. 

In an alternative embodiment, the same function can be 
applied to the interface between the repeated frame 512' and 
the next frame 514. More specifically, this embodiment of 
the present invention also adds the delay associated with the 
last block (AB5) of the repeated frame 512' with the PCM 
data associated with the first block (ABO) of the next frame 
514 (with appropriate data shuffling and weighting) to 
smooth the interface between these frames. 

Reduced Audio Frame Over-Run in Muting 
Operation 

An embodiment of the present invention provides a 
method for reducing the number of audio frames that are 
played out subsequent to a mute command. A mute com- 
mand can arise incident to a channel change command, e.g., 
a viewer decides to change a watched channel from channel 
A to channel B. When the change channel command is 
received by the decoder 200, it immediately freezes the 
current video frame as indicated by the read pointer of the 
video output buffer. The audio output, however, cannot be 
stopped immediately because it is synchronized as the slave 
to the video signal and the respective durations of the audio 
and video frames are different. This embodiment of the 
present invention reduces the number of audio frame 
ovornms, that is, the number of audio frames that are played 
out subsequent to the video freeze in a channel change 

FIG. 12 illustrates an exemplary audio output buffer 250 
containing storage (entries 250fl-250^) for at least 7 
decoded audio frames (descriptorO-descriptor6). Although 
not shown, there is a corresponding video output buffer. The 
just decoded audio frames are stored at the write pointer 770 
and the audio frames to be played out are read from the read 
pointer 760 of the audio output buffer 250. The audio output 
buffer 250 is a circular buffer and therefore the read and 
write pointers are cyclic. After a read or a write, the 
corresponding pointer is incremented by one. There is a 
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difference between the read and write pointers of about three 
to four entries in the buffer 250 to account for the well 
known delay or "lag" between the video and the audio 
information. Because it can take a relatively long time to 

; restart the read and vmte pointers to their proper positions 
and update sequences, it is not desirable to halt the read and 
write pointers in response lo a mute command. If this were 
done, there would be a slight delay noticed upon entering the 
next channel (e.g., channel B) after a channel change while 

Q these pointers become re-initialized. Therefore, this embodi- 
ment of the present invention provides a method for reduc- 
ing audio over-run without hailing the operation of the read 
and write pointers. 
FIG. 11 illustrates a decoder unit 200 in accordance with 

j5 this embodiment of the present invention. The decoder unit 
200a is similar to the decoder 200 of FIG. 3B except for a 
channel change detect logic block 715 which generates 
control signals 712 to a first zero block 710 and also 
generates control signals 714 to a second zero block 720. 

20 The first zero block 710 is responsible for zeroing the 
encoded audio bitstream subsequent to a muts command 
received over hne 712. By zeroing the input audio encoded 
bitstream when a mute command is received, this effectively 
will provide zeroed audio frames starting from the position 

25 of the write pointer 770 of the audio output buffer 250. This 
is shown by FIG. 13 with frames 4-7 being zeroed. FIG. 13 
assumes a mute command was received when the write 
pointer 770 was at frame 4. Therefore, the decoder 200a gets 
system commands from a command module to either decode 

30 the next audio firame or mute all the preceding frames in the 
decoder 200fl. 

The second zero block 720 also receives a mute command 
over line 714 and functions to zero all frames between (1) 
the write pointer 770 and (2) two frames above the write 

35 pointer 770. In the example of FIG. 13, the frames that are 
zeroed by the second zero block 720 arc frame 2 250c and 
frame 3 250d. The second zero block 720 does not zero 
frame 0 of FIG. 13 because the read pointer 760 may be 
pointing on this frame and playing it out when the channel 

40 change occurs. Frame 1 may or may not be windowed lo 
smoothen the audio. Although frames are zeroed in the audio 
output buffer 250, the write and read pointers are allowed to 
run normally. FIG. 13 therefore illustrates an exemplary 
state of the audio output buffer 250 subsequent to a mute 

45 command in accordance with this embodiment of the present 
invention. In this example, at most two decoded audio 
firames will be played out subsequent to the mute command 
(e.g., frame 0 and firame 1). It is appreciated that two audio 
frames (e.g., 64 milliseconds in duration together) is not 

50 typically audible. 

FIG. 10 illustrates the steps in accordance with this 
embodiment of the present invention. Process 600 can be 
realized in hardware or it can be leaUzed in software. As 
software, process 600 is realized as instruction code 

55 executed by system 112 (FIG. 15). Step 610 looped until a 
channel change is detected or otherwise an audio mute is 
required. At step 615, the current video frame being output 
by the video output buffer 250 is held thereby freezing the 
frame on the display device or monitor. At step 620, an audio 

60 bitstream of the decoder is zeroed, thereby causing the 
decoded audio data (as.sociated with the old channel) that is 
present within the audio buffer 250 to become zeroed 
starting at the write pointer position. As discussed above, 
this will zero all frames of the audio output buffer 250 

65 starting from the write pointer 770 and counting down the 
buffer. At step 625, the decoder directly zeros two decoded 
audio frames above the write buffer. A number of well 
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known windowing functions can be applied to these frames 
to perform the zeroing operation. As discussed above, with 
respect to FIG. 13, step 625 effectively zeros frames 2 and 
3. In the typical case, there are four audio frames between 
the read and write pointer 770, so frames 2 and 3 also 
represent the second and third frames away from the read 
pointer 760. The audio decode and playback processes are 
then allowed to operate normally and process 600 returns to 
step 610. 



cursor control or directing device 107 coupled to the bus for 
communicating user input information and command selec- 
tions to the central processor 101. The cursor directing 
device 107 can be implemented using a number of well 
known devices such as a mouse, a track baU, a track pad, an 
electronic pad and stylus, an optical tracking device, a touch 
screen etc. The display device 105 utilized with the com- 
puter system 112 is optional and may be a liquid crystal 
device, cathode ray tube (CRT), field emission device (FED, 
also called flat panel CRT) or other display device suitable 



The above process 600 can be applied to a number of well m creating graphic images and alphanumeric characters 
known encoding standards, for example, AC3, AAC, recognizable to the u 



MPEG-Audio and DV audio. Process 600 can also be 
applicable in a situation where the audio interface is sup- 
posed to be transmitting zero data right from the system 
boot-up even when there is no actual data (e.g., IEC60958). 15 
A variation of the above approach can be used to implement 
that condition. Process 600 can be used by any audio 
interfaces transmitting linear PCM data, for instance, 
ACLINK, IEC60958, or IIS. 

FIG, 14 illustrates a timing diagram of the above opera- '^^ 
tions. Signal 816 is the channel change command which 
simultaneously generates an audio mute signal at the iudi- 
cated pulse. Signal 818 represents the state of the decoding 
logic and, as shown, it drops down three cycles after the 
mute command puLse of signal 816. This represents the input 
audio encoded bitstream being zeroed by the first zero logic 
710. Signal 820 represents the video display and subsequent 
to the mute command pulse of signal 816, it enters a freeze 
frame as shown by interval 830. Signal 822 illustrates the 
operation of the audio playback of the prior art method and 
includes four audio over-run frames 835 that are played out 
after the video freeze commences. Signal 824 illustrates the 
operation of the audio in accordance with this embodiment 
of the present invention. In accordance with signal 824, only 
two audio frames 840 are played out subsequent to the start ■'^ 
of the video freeze. By reducing the audio frame over-run by 
at least two frames, the present invention is able to eliminate 
the annoying and confusing sounds that often result from a 
channel change operation of the prior art. 

Computer System Platform 

Embodiments of the present invention can be imple- 
mented within a computer system. FIG. 15 illustrates a 
computer system 112 that can be a general purpose computer 45 
system or it can be an embedded system within an electronic 
device, such as an intelligent device, an AV decoder system, 
a set-top-box, a receiver unit, a digital television unit, etc. 
Computer system 112 includes an address/data bus 100 for 
communicating information, a central processor 101 50 
coupled with the bus for processing information and 
instructions, a volatile memory 102 (e.g., random access 
memory RAM) coupled with the bus 100 for storing infor- 
mation and instructions for the central processor 101 and 



The preferred embodiment of the present invention, a 
digital audio decoder system for a multimedia information 
system having improved error concealment functionality, 
improved muting capabilities and reduced audio frame over- 
run in response to a mute command, is thus described. While 
the present invention has been described in particular 
embodiments, it should be appreciated that the present 
invention should not be construed as limited by such 
embodiments, but rather construed according to the below 
claims. 
What is claimed is; 

1. A method for muting a portion of an encoded bitstream 
of audio information comprising the steps of; 

a) with respect to a current encoded audio frame of said 
encoded bitstream, computing a length of a dynamic 
template based on an error rate of said encoded 
bitstream, said dynamic template encompassing a plu- 
rality of previous encoded frames of said encoded 
bitstream; 

b) summing errors of said plurality of previous encoded 
frames within said dynamic template to produce a first 
error sum; 

c) determining if said first error sum exceeds a prescribed 
tolerance; and 

d) adaptively merging muted error frames by muting said 
current encoded audio frame provided said first error 
sum value exceeds said prescribed tolerance whether or 
not said current encoded audio frame has an error. 

2. A method as described in claim 1 wherein said step a) 
comprises the steps of: 

al) with respect to said current encoded audio frame, 
summing errors of a plurahty of previous encoded 
frames encompassed by a fixed-length template to 
produce a second error sum; and 

a2) using said second error sum as an index to a look-up 
table to compute said length of said dynamic template. 

3. A method as described in claim 2 wherein said step a 1) 
and said step b) ate performed using an error array which 
contains a respective bit for each encoded audio frame 
indicating whether or not an error resides within its associ- 
ated encoded audio frame. 

4. A method as described in claim 2 wherein said plurality 
of previous encoded frames encompassed by said flxed- 



'U-volatile memory 103 (e.g., read only memory ROM) 55 length template are measured from and include said ci 



coupled with the bus 100 for storing static information and 
instructions for the processor 101, Computer system 112 
also includes a data storage device 104 ("disk subsystem") 
such as a magnetic or optical disk and disk drive coupled 
with the bus 100 for storing information and instructions and 5 
a display device 105 coupled to the bus 100 for displaying 
information to the computer user. 

Also included in computer system 112 of FIG. 15 is an 
optional alphanumeric input device 106 including alphanu- 
meric and function keys coupled to the bus 100 for com- 6 
municating information and command selections to the 
central processor 101. System 112 also includes an optional 



encoded audio frame and wherein said plurahty of previous 
encoded frames encompassed by said dynamic template are 
measured from and include said current encoded audio 

5. A method as described in claim 4 wherein said fixed- 
length template is 24 audio frames in length and said 
tolerance is 1. 

6. A method as described in claim 2 wherein steps a2), b), 
c) and d) are bypassed if said second error sum is zero. 

7. A method as described in clakn 2 wherein said encoded 
bitstream of audio information is substantially compliant 
with the AC3 digital audio standard. 
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8. A method for muting a portion of an encoded bitstream 
of audio information comprising tlie steps of: 

a) detecting if a current encoded audio frame of said 
encoded bitstream contains an error; and 

b) provided an error is detected, repeating a previous 
decoded audio frame in Keu of said current encoded 
audio frame, said repeating comprising the steps of: 
bl) obtaining decoded data of said previous audio 

b2) generating a repeated audio frame by replicating 
said decoded data of said previous audio frame for 
use in lieu of said current encoded audio frame; 

b3) modifying said repeated audio frame by adding 
delay information of a last block of said previous 
audio frame with pulse code modulated (PCM) data ^ 
of a first block of said repeated audio frame to 
generate new decoded data for said first block of said 
repeated audio frame; and 

b4) sending said repeated audio frame to an audio 
output buffer for playout. 

9. A method as described in claim 8 wherein said step bl) 
obtains said decoded data from said audio output buffer. 

10. A method as described in claim 9 wherein said step b3) 
comprises the steps of: 

shuffling and weighting said delay information to generate 
shuffled and weighted delay information; 

weighting said PCM data to generate weighted PCM data; 

adding said shuffled and weighted delay information with 
said weighted PCM data to generate said new decoded 30 
data for said first block of said repeated audio frame. 

11. A method as described in claim 9 further comprising 
the steps of: 

c) provided an error is detected in a next encoded audio 
frame immediately following said current encoded 35 
audio frame, repeating said current encoded audio 
frame in lieu of said next encoded audio frame; said 
step c) comprising the steps of: 

cl) obtaining decoded data of said current encoded 
audio firame; 

c2) generating a second repeated audio frame by rep- 
licating said decoded data of said current encoded 
audio frame for use in lieu of said next encoded 
audio frame; 



c3) modifying said second repeated audio frame by 
adding delay information of a last block of said 
current encoded audio frame with pulse code modu- 
lated (PCM) data of a first block of said second 
repealed audio frame to generate new decoded data 
for said first block of said second repeated audio 
frame; and 

o4) sending said second repealed audio frame to an 
audio output buffer for playout. 

12. A method as described in claim 8 further comprising 
the steps of: 

c) provided an error rate of said encoded bitstream is high, 
performing mute merging, said step c) comprising the 

cl) with respect to said current encoded audio frame of 

said encoded bitstream, computing a length of a 
dynamic template based on an error rale of said 
encoded bitstream, said dynamic template encom- 
passing a plurality of previous encoded frames of 
said encoded bitstream; 

c2) summing errors of said plurality of previous 
encoded frames within said dynamic template to 
produce a first error sum; 

c3) determining if said first error sum exceeds a pre- 
scribed tolerance; and 

c4) adaptively merging muled error frames by muting 
said current encoded audio frame provided said first 
error sum value exceeds said prescribed tolerance 
whether or not said current encoded audio frame has 
an error. 

13. A method as described in claim 12 wherein said step 
cl) comprises the steps of: 

with respect to said current encoded audio frame, sum- 
ming errors of a plurality of previous encoded frames 
encompassed by a fixed-length template to produce a 
second error sum; and 

using said second error sum as an index to a look-up table 
to compute said length of said dynamic template. 

14. A method as described in claim 8 wherein said 
encoded bitstream of audio infonnation L 
compliant with the AC3 digital audio standard. 
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[57] ABSTRACT 

A signal coding system capable of high efficiency, high 
quality signal coding is provided. Digital signals represented 
in the time domain are divided into set time interval data 
units and output. One output is converted to a digital signal 
represented in the frequency domain, and the o&er is output 
as-is. The energy dispeision of the d^ital signal lepiesented 
in the frequency d<nnain is compared with that of the digital 
signal represented in the time domain, and the digital signal 
having the least energy dispersion is coded. This coded 
digital signal is Qieo multiplexed widi an identiflcation 
signal to identify it as a frequency domain or time domain 
si^^nal, and Hie resulting imiltiidexed signal is ou^uL 
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DIGITAL SIGNAL PROCESSING CODING f«m (KLT) etc. is perfanned on ihe signal, to traiisf<xm it 

AND DECODING SYSTEM to the ftequency domain. Finally, in quantizer-coder 3, ftc 
oulput of discrete transform processor 2 is cwjVCTted to a bit 

BACKGROUND OF THE XNVENTION stream in which most of the hits are allocated to portions of 

. ^ , ^ . 5 the frequency spectrum that contain large amounts of 

1. Field of the Invenuon en«gy, or that arVaudiWy important 

This invenuon relates to signal encoding and decotog ^^^i transform coding, when the signal is transformed to 

system that compresses and decom^s fi^e mfoo^to^ toe^u^^main, le mora uneven the dispersion of 

content of a pulse-code-modulated (PCM) digital signal. ^^Jp^Z the sp^irum, the higher the co^npression 

2. Descnption of the Prior Art lo ^^^^ ^ ^^cSok desirable to use the discrete transform 
There exists in the art, as a method of oonvertiiig an ^jj^. y^gst transformation efBiciency. The KLT trans- 
analog audio signal into a digital signal, a time domain j,^ hi^est 'Ideal" efBciency, but in temos of 
representation mefliod wherein the aiqplitndc of the analog practical efBciency (number of calculations, etc.) it is about 
signal is quantized into discrete units of quantity by sam- jjj^ jj^. Therefore, the transform that is 
pling it at fixed time intervals. is noonally used is actually the DCT, which has the highest to 

With such methods that represent signals hi the tunc connMitation speed, 

domahi only, however, the volume of data in the resulting decodmg system shown in FIG. 3, the bit 

digital signal is large, and neither the data transimssion received at the decoding system input is a digital 

signal band nor the storage capacity of the storage media can ^^^^^ represented in the frequency domain. This input 

bereduced.Thereare, therefore, a number of methods that 20 supplied ^ inverse quantizer-decoder 4, where it is 

may be considered for compressing such digital signals in decoded. The on^ut of inverse quantizer-decoder 4 is fed to 

the time domain. inverse discrete transform processor 5, where its inverse 

The mam methods for audio signal coding are subband discrete transform is returned to the time domain; ie. ttie 

codmg (SBC), in which the signal is coded by dividing the inverse discrete cosine transform (IDCT), hiverse discrete 

signal into subbands, and advanced transformation coding 25 Fourier tiansfonn(IDFT), or inverse Kaihunen-Loevetcans- 

(ATC) in which the signal is coded by adaptive tiansforma- f^n^ (nOT), etc., as applicable, is transformed. The output 

tion. In both SBC and Arc coding, an audio signal input as inverse discrete transform processor 5 is inveise- 

a tune series (time domain) signal is tcansfocmed to flic windowed by ftame buffer 6, and output as a decoded digital 

ftequcncy domain, and tticn coded using the uneven diqter- g,^^ represented in the time domain, 

sion of enwgy across a wide band in the ftequency domain. » ^ windowmg process multiplies each frame of 

That is, as shown in raO, 1(A), the eneigy of a constant- ^j^^ ^jg^^ ^ inverse of the function used to window it, 

type sound signal, such as that produced by a vrind or string thereby restoring the an5>Umde of the audio signal to its 

instrument, is widely dispersed in the time domam, and a origiiiai state removing the window con^nents. 

largeamoiintofdatawouldtterefMsberequiredtocodeit, ^ information content of a constant-type tone 

If, however, a discrete transform of the above constant-type pjO jp/^f ^„ ^ 

tone is taken to convert it to the frequency domain, as shown performing a disaeie transform to translate 

mFIG 1(B), theextentof theenergy dispersion is sniall, and the frequ«^ domain. As shown in FIG. 4(A), 

it therefore requires only a small amount of data to express howe^.inimpulse-type^ound signals such as produced by 

^ 4Q percussion instruments, the energy dispersal in the time 

In this coding method, data compression is performed by domain is small and the energy is unevenly distributed. If the 

allocating a large number of bits (information) to the coding discrete transform of this type of audio signal is taken, to 

of frequency bands (subbands) that have a large amount of translate it to 4e frequency domain, the enagy will be 

eneigy, and few bits to subbands that have little energy, or widely dispersed, as shown in FIG. 4(B). This was a proMem 

that are audibly unimportant. with the conventional system, in that for ftis type of sipwl, 

FIG. 2 shows an example of a signal coding system that raflier than bdng in^proved, the conpession efttciency was 

performs such transform coding, and FIG, 3 shows an actoidly reduced. 

example of a system that decodes the resultmg signal. These An<Aer problem with this system was that in in^ulse- 

systems are described below. ^ j^und signals, when portions having abrupt energy 

In the system of FIG. 2, an input digital audio signal jq changes were coded in the frequency domain, a type of noise 

(PCM signal) represented in the time domam is supplied to referred to as "pre-echo noise" was produced in the low 

frame buffer 1, where it is windowed weighted by a window energy portions of the signal, degrading the coding qnaH^. 

function and ou^ut, frame-by-frame. 

m the windowing process, window i\incdons such as BRIEF SUMMAKT OF THE INVENTION 

Banning Wmdow and HaraimngWmdow are applied to the 55 ^ Obiect of the Invention 
input audio signal of a continuous time-series signal to 

weight its anqilitude, and is divided into "frames," the units It is the object of this invention to effect high efficiency, 

in which subsequent signal processing is perfonned (see F. high quality signal coding and decoding by switching 

J. Harris, "On the Use of Windows for Harmonic Analysis between the time and frequency domains of the digital 

with the Discrete Fourier Transform.", Ptoc. JEEE, vol.66, eo signal, d^nding on the nature trf flie input d^tai signal, to 

no. l,pp.51-83, 1978;MikioTakagiandHaruhisaShimoda, perform the coding. 
"Gazoh Kaiseki Handbook", Tokyo Daigaku Shuppan, 



pp20-25, 1991). 



2. Brief Summary 



The output of frame buffer 1 is then supplied frame-by- Provided, according to a first aspect of this invention, is 

frame to discrete transform processor 2, where a discrete 65 a signal coding system for coding an input digital s^nal, 

transform such as the discrete cosine transfram (DCT), comprising: a data accumulation nwans for dividing an mput 

discrete Fourier transform (I)FI), Kailmnen-Locve trans- digital signal represented in a time domain into set time 
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intervals, and ou^utting said signal; a discrete tiansfonn of its eneigy dispearsion in the frequency domain with the 

processing means ttansfonning a digital signal received extent of its energy dispersion in the time domain, and codes 

from said data accumulation means into a distal signal the data in flie frequency domain if it is constant-type sound, 

rcj^esented in a frequency domain; a discrimination means and codes it in the time domain if it is impulse-type sound, 
for determining whether an input digital signal is a constant- 5 and by so doing, in^ves the coding quality and coding 

type digital signal or an inqjulse-type digital signal, and for efficiency over that which could be realized by coding in the 

concmiently ou^utting an identification signal indicating ftequency domain only. 

^^'fr'^n"^.*^' »»^ve and other related objects and featores of the 

on the identification signal supplied from said discnimna- . ^ ... , ^ „ .. 

tion means, said input digital signal is found to be a ™on wiU apparent from a readmg of tte foUowmg 

^ns"p. sTgnXco^s faid 5^ s gnal converted by ^escnption of the fomid m the accompanymg drawings, and 

said disaete prLsslng means fS reprfsentation in said "^^^^^y ^-^^^^^ P«^»'^ ^ ^PP^"'^'* 

frequency domain, and if said input digital signal is found to gjygp DESCRIPTION OF THE DRAWINGS 
be an impulse-type sound-type signal, codes said digital 

signal represented in said time domain; and a multiplexing jj FIG. 1(A) is a waveform diagram showing the wavefona 
means for multiplexing said digital signal coded by said of a constant-type sound signal, and in particular, the wave- 
coding means with said identification dgnal. form of the signal in the time domain. 

Further provided, according to a second aspect of this EKi. 1(B) is a waveform diagram showmg the waveform 

invention, is a signal coding system for coding input digital of a constant-type sound signal, and in particular, the wave- 



EIG. 2 is a block diagram showing an example of a signal 

tunc intervals, and outputting said signal; a discrimmation coding system that performs transform coding, 

means for determining whether a digital sign^rec^iyed HG. 3 is a Mock diagram showing an example of a signal 

from said data accumulation means is a constant-type digital decodiae system. e b" 

signal or an impulse-type digital signal, and for concurrently 25 -n, f , ^ . r j- 

outputting an identification signal indicating the resuh of ™' 'KA) is a wavefonn diagram sbowmg the waveform 

this determtaation; a discrete transfoim processing means of ™ impulse-type sound signal, and in particular, the 

for transf wming a digital signal received from said disaimi- waveform of the signal in the time domam. 

nation means into a digital signal represented in a frequency FIG. 4(B) is a wavefonn diagram showing the waveform 

domain; a coding means that if, based on the identification 30 an impulse-type sound signal, and in particular, die 

signal received from said discrimination means, said input waveform of the signal in the frequency domain. 

digital signal is found to be a constant-type signal, codes FIG. 5 is a block diagram showing one embodiment of the 

said digital signal converted by said discrete jK-ocessing signal coding system of the present invention. 

means for representation in said frequency domain, and if piQ. ^i^a. block diagram showing another embodiment 

said input digital signal is found to be an in^e-type 35 of Ae signal coding system of the present invention, 

!rS±lf:,n'?f S^Kl^lSlf EK}. Tisablockdiagram showing one embodiment of the 

said tome domain; and a mnWplexing means for mul^es- . , system of the present invention, 

hig said digital signal coded by said coding means wUh said "iZl „ f"'"^"'- "•vcuuuu. 

identiflcation signal " is a wavefonn diagram showing an example of a 

Still fiirther provided, according to a third aspect of this 40 signal having a mixture of constant-type sound and impulse 

invention, is a signal decoding system for decoding a digital type soun . 

signal divided into set time intervals containing a mixture of P^G. 9 is a diagram fear explaining one exampk of a time 

digital signals represented in the frequency domain and ftequency discriminator as shown in HG. 5. 

digital signals rqxresented in the time domain and coded in FK}. 10 is a diagram for explaining anothear exan^le of a 

this mixed state, and also having multiplexed therein, iden- 45 time frequency discriminator as shown in FIG. 5. 

tification signals that Identify the content of eadi time hG. His a diagram for explaining one example of an 

interval as either a time domain or a frequency domain eneigy dispersion detectw as shown in PIG. 6. 

signal, con^sing: a separation means for separating an HG. 12 is a diagram foi explaining another example of an 

input digital signal into said coded digital signal and said ^ dispersion detector as shown in HG. 6. 

identification signal portions; a decoding means for decod- 50 <» i- 
ing said coded digital signal received from said separation 
means; a discrimination means for determining whether a 
digital signal received from said decoding means is repre- 

sented in the frequency domain or in the time domain, based """^ joefetred embodiment ttf Ifae present invention is 

on said idendflcalion signal received from said separation 55 described in detail below, based on die accoii^>anying dtaw- 

means; an inverse discrete transfam processing means for 

converting a digital signal received from said discrimination q Embodiment ct the Signal Coding System 
means represented m the frequency domain to a digital xjuuu^u^wui ^ v,uuuig ojfo^iii 
signal represented in the time domain; and an output means EIG. 5 is a block diagram showing one embodiment of the 
for outputting, in time series sequence, digital signals rep- 60 signal coding system of the present invention. In HG. 5, the 
resented in the time domain received from said invwse input digital audio signal represented in the time domain is 
discrete transfom processing means, and digital signals suppUed to frame buffer U, where it is windowed frame- 
represented in the time domain received from said disciimi- by-flame, and output 

nation means. Jn the windowing process, the input audio signal (a 

The invention determines, for each transform coding 65 continuous time-series signal) is multiplied by window 

frame, whether the sound represented therein is constant- fbnctions such as Harming window and Hamming window 

type sound or in^julse-type sound by conoparing the extent to weight its amplitude, and is then divided into "frames," 
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which are the data units on which subsequent signal pro- Acwadingly, if the system received an audio input signal 
cessing will be petfQrmed(see F. J. Harris, "On the Use of containing a mix of both constant-type and impulse-type 
Windows for Harmonic Analysis with the Discrete Fourier sound-type signals, such as that shown in HG. 8, the output 
Transform.", Proc. IEEE, vol.66, no. 1, pp.51-83, 1978; signal selected fear the first and third frames would be the 
Mildo Takagi and Haruhlaa Shimoda, "Gazoh KaiseM 5 frequency domain signal, and that selected for the second 
Handbook', Tokyo Daigalai Shuppan, pp.20-25, 1991). frame oii^ut would be the time domain signal. 

One of the outputs of frame buffer 11 is supplied frame- in quantizer-coder 14, quantization is petfccmed such that 
by-frame to discrete transform processor 12, where a dis- portions of Ihe input signal spectrum that have a large mount 
crete transform such as the discrete cosine transform (DCT), energy and portions that are impwtant for auditory 

discrete Fourier transform (1)FT), Kariiunen-Loeve trans- perception are aUocated most of the available bits, and the 
form (KLT) etc. is performed on the signal to map it to the ^suiting signal is dien output to multiplexer 15. 
frequency domain, after which it is output to time/frequency MultiplexH- 15 raultfelexes the time/firequenCT identifica- 
discrimlnator 13. The other output of frame buffer 11 is sent, lecdved fiom tiiiWfrequency discriminator 13 

still a time domain signal, to time^requency discnnnnator ftame-^-frame with the Signid received from quantizer- 
,„ .... 15 coder 14, and outputs ttie result as a bit stream. The 

Tmie/ftequency discrumnator 13 compares the energy timetfrequency identtBcation flag consists of one bit header, 
dispersion in the frequency domain signal received from ^^kilh^d of the data bits, 
discrete transform processor 12 with the energy dispersion j w, ^ ,u T„ i o,,.*™ nf thi. 

in the time domak signal received direcfly from frame . '^^^^^^^ 
buffer 11, and outputs, to quantizer-coder 14, the signal invention is capable of perfonmng efficient codmg of audio 
having less widely dispersed energy. At the same time, 20 signals containmg a mixture of constant-type and impulse- 
time/frequency discriminator 13 also outputs an identiflca- type sound components. Also, since impulse-type sound, 
tion flag to multiplexer 15 to identify the signal being sent which gives rise to abrupt energy changes, is coded m the 
to as a time domain signal or a frequency domain signal time domain, the disturbances referred to as pte-edio noise 

Now, the two signals input to tfane/frequency discrimina- do not occur, thus preventing the degradation of quality 
tor 13, are, as indicated in FIG. 9, a time axis signal x(t) 25 normally associated therewith. 
{t=0, 1, . . . . , N (wtoe N is the frame J^^)}- ?fd a Embodiment of the Signal Coding System 

frequency axis signal X(f) {f=0, 1, . . . , N (where N is the ^ j 

frame lengft)}, eadi having time energy T(t), and ficequency Next, another embodiment of the signal coding system of 
eneigy S(f), reflectively, which are given by the following the present invention will be explained, with reference to 
equations: ^ HG. €. The parts of the system that are the same as in the 

embodiment diat was described above using HG. 5 are 
2^,) = assigned the same reference numbas as in FIG. 5, and are 

not discussed here. 

wf^-Sf»w ^ InFIG. 6, an input digital audio signal represented in the 

^ '^'^ 35 thne domain is supplied to frame buffer 11, where it is 

Also, time dispasion TW and frequency dispersion FW ™^?'i*^'^n^L'^'^' «><put The output of frame 
are givk by thefollowing equations! buffer 11 is supplied to energy dispersion detector 21. 

Energy dispeacsion detector 21 determines whether the 
1 (3) level of enetgy di^iersion in the input digital audio signal is 

TW=-fi- X(m-Tcenif' 40 ^ ^ irodetennined energy dispersion value 

Whereltentisthecenterof energy concenJiationinlhetime (threshold level), and foncunently outputs, to multiplexa 
domSn ' " indicating which of these two conditions exists. 

the energy diq>ersion exceeds the threshold level, the 
J (4) signal is detennined to represent constant-type sound, in 

FW=-jf^ X(F(fi-Fccotj' Jn/Uiit case the output of energy dispersion detector 21 is 

.b™ PC., .,h.c„,= of SlSS'SS'SSfSfS'^S 

ZZ J- ■ 1 J- ♦ (V >,«j type sound, m wiuch case the output of energy dispt^sion 

"frequency dispersion value mdicates the frequency van- • tiiTdomainX^ quandzer- 

ance of the energy content of the frame with respect to the ^^jct 14 

center of energy concentration on the frequency axis. ^ . ' , . . j,... ^ „ j 

The magnitudes ofthe time dispersion TW and frequency 55 Th": signal input to quantizer-coder 14 is quantaed and 

dispersion FW determined as described above are then o"*«t to multiplexer 15. Multiplexer 15 multiplexes the 

compared, and the signal in the domain having less energy time/frequency identification flag ou^t from energy dis- 

dispCTsion is output, along with its coirespondmg flag. TTiat P^^ion detector 21 frame-by-frame with die signal received 

is, the time dispersion (TW) is less, the time axis signal ^^m quantizer-coder 14, and outputs the result as a bit 

and flag are output, and if the frequency dispersion (FW) is 60 stream. 

less, the frequency axis signal and flag are output Thus as explained above, the result obtained in the 

Since time/frequency discriminator 13 of FIG. S outputs embodiment of FIG. 6 is the same as in the embodiment of 

the signal in the domain having the least energy dispersion FIG. 5. 

(frequency or time), then a) if the input is ^ constant-type Discrimination Methods 
sound signal, the frequency domam signal wiU be selected, 65 

and b) if the input is an ii^pulse-type sound signal, the time In both of flie embodiments described above, the deter- 

domain signal will be selected. mination as to whettier the input signal represented constant 
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or jiiq)ulse-type sound was made by detecting the amount of 
energy dispersion in the digital signaL This detennination, 
however, may just as weU have been performed by other 

methods. 

In constant-type audio, for example, the decay curve of - 
the envelope is usually gradual, and the envelope of an 
inapulse-type signal has a sharp rising edge. 

Alternate Method 1 

Accordingly, the diiferences in the mounts of energy at 
various frequencies in a digital signal represented in the 
frequency domain can be determined, A signal with large 
energy differences can then be classified as a constant-type 
sound signal, and one in which the differences are not too 
great as an impulse-type signal. This method can be imple- 
mented by simply changing the time/frequency discrimina- 
tor 13 in the signal coding system of FIG. 5. 

In this case, as shown in FIG. 10, a frequency axis signal 
X(f) {f=0, 1, . . . , N (where N is the frame lengBi)} is input 
to time/lTequency discriminator 13, and the £tequency 
eneigy F(f) computed using equation (1), above. The total of 
the energy differences between adjacent frequency conpo- 
nents on the frequency axis FZ is then calculated, nsing the 
following equation: 



The total of the average energy differences TZ, deter- 
mined as indicated above, is then compared with a threshold 
level set in advance. If the total of the average energy 
differences TZ is less than the threshold level, the signal is 
considered a constant-type sound signal, and is output as a 
frequency axis signal, along with the corresponding flag. 
Conversely, if (he total of the energy differences TZ exceeds 
the threshold level, the signal is judged an in^julse-type 
sound signal, and is ou^ut as a time axis signal, with Uiat 
10 flag. 

Alternate Method 3 

Another possible method finds the auto-conelation coef- 
fidents of the frames of a digital signal represented in the 
time domain. Those signals with high auto-cccrclation are 
then classified as constant-type sound, and those low auto- 
correlation as impulse-type sound. Vfiih this method as well, 
it is necessary only to diange energy dispersion detector 21 
^ of FIG. 6 to configure the signal coder system. 

Jn. this case, as indicated in FIG. 12, in energy dispersion 
detector 21, a time axis signal x(t) {t=0, 1, . . . , N (where 
N is the frame length, in bits)} is ii^mt, and its auto- 
correlation coefSdent CR(n) is calculated, using the follow- 
25 ing equation: 



(B) 



M,2,...,N 

The total of 4ie eneigy differences calculated as indicated ^ 
above is then compared with a threshold level that has been 
set in advance. If the total of the energy differences FZ is less 
than the threshold level, the signal is considered impulse- 
type sound, and it is ou^ut as a time axis signal, along with 
the corresponding flag. Conversely, if the total of the energy ' 
differences FZ exceeds the threshold level, the signal is 
judged as a constant-type sound signal, and is output as a 
frequency axis signal, along with that flag. 

Alternate Method 2 + 
In a digital signal represented in the time domain, the 
difference between present and preceding amplitudes can be 
detected, and the difference compared against a set value, 
Signals in which the difference falls below, and those in . 
which the difference falls above the threshold level would 
then be processed as constant, and in^ulse-Qrpe sound 
signals, respectively. The signal coding system for this 
method can be configured by sin^y changing the energy 
dispersion detector 21, as sliown in FIG. 6. ^ 

In this case, as indicated in FIG. 11, in eneigy dispersion 
detector 21, a time axis signal x(t) {tM), 1, . . . , N (where 
N is the frame length)} is input, and tune axis enexgy T(n) 
calculated by the following equation: 



x*(t): complex conjugate of x(t) 

The magnitude of the coe£Bdent's seccHid peak PGR is 



where M is the number of samples conesponding to about 
10 ms. 

The total cf the differences between the average energy i 
levels of adjacent fixed interval sauries M (about 10 ms) on 
the time axis, TZ, is then calculated by flie following 
equation: 

n=l, 2, . . mn 



rCll=3iidpaikofCR(n) (9) 
»=1.2 N 

3 

The magnitude of the detected second peak PGR is then 
conq>ared wift a threshold level set in advance. If the 
magnitude of the second coefficient peak is less than the 
threshold level, the signal is considered an impulse-type 

; sound signal, and is output, abng with the corresponding 
flag, as a time axis signaL Convexsely, if &e magnitude of 
the second peak of the co^cient PGR is greater than the 
thre^old level, the signal is determined to be a constant- 
type sound signal, and is output as a frequency axis signal, 

] idong with that flag. 

An Embodiment of the Signal Decoding System 

An embodiiaent of the signal decoding system of the 
present invention is shown in FIG. 7 and exjdained below. 
' This signal decoding system is capable of decoding 
signals coded by any of the above described coding qrstems. 

The signal decoding system of FIG. 7 is a decoding 
system for decoding a coded digital audio signal input 
received as a bit stream. 

In FIG. 7, the input signal is supplied to demultiplexer 16, 
whidi divides the signal into the data signal and the time/ 
frequency identification flag, which are then fed to inverse 
r 17 and time/frequency discriminator 18, 



Inverse quantizor-decoder 17 decodes the data signal and 
outputs result to time^Erequency disociminator IS. 
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TimefequencydbraiminatOT 18 deddes whether tiie data 2. The signal coding system ot claim 1, wherein said 

signal it is receiving from inverse quantizer-decodor 17 is a discdmination means conqnres the eneigy dispersion of 

frequency domain signal or a time domain signal, based on said digital signal represented in the frequency domain and 

the time/frequency identification flag it receives from recdved from said discrete Inmsfonn processing means witii 

demultiplexer 16. If it is a frequency domain signal, it 5 the energy dispersion of said d^tal signal represented in ttie 

cutouts it to inverse discrete transform processor 19, where ^ dmnain and recdved fcom said data accumulation 

an inverse discrete transform such as the inverse disaete means, and ovtpats the one of these two digital signals that 

cosine transform (IDCr), inverse discrete Fourier tranrfarm has &e least ena^ydlspcaaon. 

(]DFr),ortheinverseKarhunen-LoevetransfQnn(IKLT),is .The ^ignal coding ^stcm of c^^ 

paformed on it, to transform it to the time domain, after ,0 discmnma^on means detemanes 

Sch it is output to fr-e buffer 6. If time/fre^ency S^^^^f^rSL^a^S^dtS 

discnimnator 18 detenmnes tha the signal is a dme domam fr^TsXiscrete transform ^sing means, and dassi- 

signal, It outputs it as-is, directly to frame buffer 20. ^^^^ ^ there are large energy differ- 

KnaUy, flie signal is invrase-windowcd in frame buffer f^^^ constantXype digital si^al, and digital signals in 

20, and ou^t as a digital audio signal represented in flie is vvbich there are snudl energy differences as impulse-type 

time domain. digital signal 

The inverse windowing process multiplies each frame of 4. A signal coding system for coding input digital signals 

the signal by the inverse of the function used to window it, con^)rising: 

thereby restoring the an^Utude of the audio signal to its a data accumulation means for dividing an input digttal 

original prewindowing state, 20 signal represented in a time domain into set time 

In this manner, the signal decoding system of the present intervals, and on^pntting said signal; 

invention can accurately decode a coded audio signal bit a disciimination means for determining whether a distal 

stream containing a mixture of frequency domain and time signal received from said data accumulation means is a 

domain signals constant-type digital signal or an impulse-type digital 

AS described above, the signal coding system of the ^ '^8°1^'","«'°^^'^'"n^°f ^^^^ 

present invention is capable of effldentty coding audio ^^S"'^ ^'^'f^ the result of this detenmnaUon; 

Sgnals that contain a^tore of constant-type sound and " ^"^f ttanrform pcocessmg means for tranrformmg a 

: * , /i^^ digital signal received from said discnmmation means 

mipulse-type sound signals. Also, since unpulse-type sound il^adijlal sigmd represented in a frequency domain; 



date, whid, gives rise to abrupt energy changes, is coded in , hr^fT^LTrZlrnH^^^^^ 
the time domain, so-called j^e-echo noise does not occur, a coding means that if, based on the identification signal 
and the degradation of coding quality associated therewith is 



received from said discrimination means, said input 
digital signal is a constant-type signal, codes said 
. J, . digital signal transformed by said discrete processing 

In addition, since signals with litfle energy (Uspersion ae ^^^^ far representation in said frequency domain, and 

selected, they can be used for a vector quantization (VQ) jf said input digital signal is an impulse-type signal, 

pre-process, utilizing the statistical bias of the spaces to represented in said time 

generate the VQ code book. domain; and 

In the signal decoding system of die present invention, the ^ mmtipieadng means for multiplexing said digital signal 
advantage is the system's capability to accurately decode a ^jAj identification 

coded audio signal bit stream including a mixture of fre- ^ signal. 

quency domain and time domain signals. g Jjg^^ j.^„g ^y^j^ pi^im 4, wherdn said 

What is claimed is: discriminatton means detects the difference between the 

1. A signal coding system for coding an input digital unmcdiatdyjHecedingamiiiitnde and the present amjiJltude 
signal, con^sing: jj, j^tal signals rqiresented in the time domain and 

a data accumulation means for dividing an ii^wt digital received from said data acamudallon means, and dassifies 
signal represented in a time domain into set time jjgjtai signals in whidi said difference is bdow a predeter- 
intervals, and ouq>utting said signal; mined value as constant-type digital signal, and fliose in 

a discrete transform processing means for tcansfoaning a wbidi said difference is above said predetermined value as 
digital signal received from said data accmmilation impulse-type digital signal. 

means into a digital signal represented in a frequency ^ The signal coding system of claim 4, wherein said 
domain; disotmination means determines auto-corrdadon coefB- 

a discrimination means for determining whether an input dents within frames of said digital signal represented in the 
digital signal is a constant-type digital signal or an time domaiu and recdved from said data accuiniilation 
inq)ulse-type digital signal, and for concurrently out- means, and classifies digital signals having high auto- 
putting an identification signal indicating the result of 55 correlation as constant-type digital signal, and digital signals 
this detemiination; having low auto-correlation as impulse-^pe digital signal. 

a coding means that if, based on the idendflcation signal 7. A signal decoding system fw decoding a digital signal 
received from said discrimination means, said input divided into settimeintwvalscontaiBing a mixture of digital 
digital signal is a constant-type signal, codes said signals reja-esei^ in ttie frequency domain and digital 
digital signal transformed by said discrete transform eo signals represented in the time donoin and coded in this 
processing means for representation in said frequency mixed state, and also having ffloltiplexed thordn, identifi- 
domain, and if said input digital signal is an unpulse- cation signals that identify the content of eadi time interval 
type signal, codes said digital signal represented in said as dther a time d<Mnain or a frequency domain signal, 
time domain; and conqirising: 

a multiplexing means for multiplexing said digital signal 65 a sq>aration means for separating an input digital signal 
coded by said coding means with said identification into said coded distal signal and said identlQcation 

signal. signal portions; 
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decoding means for decoding said coded digital signal 
received from said separation means; 
discrimination means for determining whether a digital 
signal received from said decoding means is rejire- 
sentcd in the frequency domain or in the time domain, ^ 
based on said identtflcation signal received from said 
separation means; 

n inverse discrete transform processing means for trans- 
forming a digital signal received from said discrimin»- 
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tlon means represented in the frequency domain to a 
digital signal represented in the time domain; and 
an output means for ou^utting, in time series sequence, 
digital signals represented in (he time domain received 
from said inverse discrete transform processing means, 
and digital signals represented in the time domain 
received from said discrimination means. 



